[DEV] VESSL Docs
  • Welcome to VESSL Docs!
  • GETTING STARTED
    • Overview
    • Quickstart
    • End-to-end Guides
      • CLI-driven Workflow
      • SDK-driven Workflow
  • USER GUIDE
    • Organization
      • Creating an Organization
      • Organization Settings
        • Add Members
        • Set Notifications
        • Configure Clusters
        • Add Integrations
        • Billing Information
    • Project
      • Creating a Project
      • Project Overview
      • Project Repository & Project Dataset
    • Clusters
      • Cluster Integrations
        • Fully Managed Cloud
        • Personal Laptops
        • On-premise Clusters
        • Private Cloud (AWS)
      • Cluster Monitoring
      • Cluster Administration
        • Resource Specs
        • Access Control
        • Quotas and Limits
        • Remove Cluster
    • Dataset
      • Adding New Datasets
      • Managing Datasets
      • Tips & Limitations
    • Experiment
      • Creating an Experiment
      • Managing Experiments
      • Experiment Results
      • Distributed Experiments
      • Local Experiments
    • Model Registry
      • Creating a Model
      • Managing Models
    • Sweep
      • Creating a Sweep
      • Sweep Results
    • Workspace
      • Creating a Workspace
      • Exploring Workspaces
      • SSH Connection
      • Downloading / Attaching Datasets
      • Running a Server Application
      • Tips & Limitations
      • Building Custom Images
    • Serve
      • Quickstart
      • Serve Web Workflow
        • Monitoring Dashboard
        • Service Logs
        • Service Revisions
        • Service Rollouts
      • Serve YAML Workflow
        • YAML Schema Reference
    • Commons
      • Running Spot Instances
      • Volume Mount
  • API REFERENCE
    • What is the VESSL CLI/SDK?
    • CLI
      • Getting Started
      • vessl run
      • vessl cluster
      • vessl dataset
      • vessl experiment
      • vessl image
      • vessl model
      • vessl organization
      • vessl project
      • vessl serve
      • vessl ssh-key
      • vessl sweep
      • vessl volume
      • vessl workspace
    • Python SDK
      • Integrations
        • Keras
        • TensorBoard
      • Utilities API
        • configure
        • vessl.init
        • vessl.log
          • vessl.Image
          • vessl.Audio
        • vessl.hp.update
        • vessl.progress
        • vessl.upload
        • vessl.finish
      • Dataset API
      • Experiment API
      • Cluster API
      • Image API
      • Model API
        • Model Serving API
      • Organization API
      • Project API
      • Serving API
      • SSH Key API
      • Sweep API
      • Volume API
      • Workspace API
    • Rate Limits
  • TROUBLESHOOTING
    • GitHub Issues
    • VESSL Flare
Powered by GitBook
On this page
  • 1. Prepare a Model to Serve
  • 2. Create a Serving Instance
  • 3. Create an Endpoint
  • 4. Dividing Traffic Among Multiple Revisions
  • 5. Helpful Tips for Using VESSL Serve
  • Simultaneously Update Revisions and Endpoint Configurations
  • Troubleshooting
  1. USER GUIDE
  2. Serve

Quickstart

PreviousServeNextServe Web Workflow

Last updated 1 year ago

This document provides a quickstart guide of VESSL Serve - managing revisions and the gateway using YAML manifests.

1. Prepare a Model to Serve

Prepare the model and service for deployment. In this document, we will use the where you can train a model and register it to the VESSL Model Registry.

Use the following command in the CLI to proceed:

# Clone the example repository
git clone git@github.com:vessl-ai/examples.git
cd examples/mnist/pytorch

# Train the model and register it to the repository
pip install -r requirements.txt
python main.py --output-path ./output --save-model

# Register the model
python model.py --checkpoint ./output/model.pt --model-repository mnist-example

For more detailed information about the VESSL Model Registry, please refer to the section.

2. Create a Serving Instance

Create a serving instance for deployment. Navigate to the 'Serving' section in the VESSL Web Console and click the 'New Serving' button. This will allow you to create a serving named mnist-example.

  1. Write manifest file for serving revision

Create a new serving revision. Save the following content as a file named serve-revision.yaml:

message: VESSL Serve example
image: quay.io/vessl-ai/kernels:py38-202308150329
resources:
  name: v1.cpu-2.mem-6
run: vessl model serve mnist-example 1 --install-reqs
autoscaling:
  min: 1
  max: 3
  metric: cpu
  target: 60
ports:
  - port: 8000
    name: fastapi
    type: http

You can easily deploy the revision defined in YAML using the VESSL CLI as shown below:

vessl serve revision create --serving mnist-example -f serve-revision.yaml

Ensure that you specify a container image with the same Python version as used during model creation. For instance, if you trained the model with Python 3.8, it's recommended to use an image containing Python 3.8, such as quay.io/vessl-ai/kernels:py38-202308150329.

3. Create an Endpoint

To perform inference with the created revision, it's necessary to expose it to the external network. in VESSL Seriving, Gateway(Endpoint) determines how traffic is routed and distributed to which port.

Firstly, create a YAML file defining Gateway. Create a file named serve-gateway.yaml with the following content:

enabled: true
targets:
  - number: 1   # Use the revision number you got in previous step
    port: 8000
    weight: 100

The Gateway can be easily deployed using the VESSL CLI, as shown below:

vessl serve gateway update --serving mnist-example -f serve-gateway.yaml

To check the status of the deployed Gateway, use the vessl serve gateway show command.

vessl serve gateway show --serving mnist-example

You can check the status of the deployed Gateway as shown below:

 Enabled True
 Status success
 Endpoint model-service-gateway-xyzyxyxx.managed-cluster-apne2.vessl.ai
 Ingress Class nginx
 Annotations (empty)
 Traffic Targets
 - ########## 100%:  22 (port 8000)

4. Dividing Traffic Among Multiple Revisions

To deploy a new version of the model without interrupting the service, a process is required where the new version is deployed first, followed by a gradual transition of traffic.

In VESSL Serve, the Gateway (Endpoint) provides the capability to distribute traffic across multiple Revisions.

Begin by defining and deploying the new Revision.

message: Revision v2
image: quay.io/vessl-ai/kernels:py38-202308150329
resources:
  name: v1.cpu-2.mem-6
run: vessl model serve mnist-example 2 --install-reqs # New model version
autoscaling:
  min: 1
  max: 3
  metric: cpu
  target: 60
ports:
  - port: 8000
    name: fastapi
    type: http
vessl serve revision create --serving mnist-example -f serve-revision.yaml
Successfully created revision in serving mnist-example.

 Number 2
 Status pending
 Message Revision v2

Subsequently, modify the serve-gateway.yaml to split traffic to the new Revision.

enabled: true
targets:
  - number: 1
    port: 8000
    weight: 90
  - number: 2
    port: 8000
    weight: 10

Update the Gateway configuration with the provided settings:

vessl serve gateway update --serving mnist-example -f gateway.yaml

Executing this command will display the Gateway's status, revealing the distribution of traffic across the specified Revisions.

Successfully update gateway of serving mnist-example.

 Enabled True
 Status success
 Endpoint model-service-gateway-xyzyxyxx.managed-cluster-apne2.vessl.ai
 Ingress Class nginx
 Annotations (empty)
 Traffic Targets
 - #          10 %:  1 (port 8000)
 - #########  90 %:  2 (port 8000)

5. Helpful Tips for Using VESSL Serve

Simultaneously Update Revisions and Endpoint Configurations

After defining a Revision using YAML, you can create the revision and launch the gateway simultaneously by providing parameters directly in the CLI. Here's an example of the CLI command:

vessl serve revision create --serving serve-example -f serve-exmple.yaml \
  --update-gateway --enable-gateway-if-off --update-gateway-port 8000 --update-gateway-weight 100

By using the --update-gateway option, you can update the gateway (endpoint) simultaneously while creating a revision. The following options can be used in conjunction:

  • --enable-gateway-if-off: This option changes the gateway's status to "enabled" if it's currently disabled.

  • --update-gateway-port: Specify the port to be used by the newly created revision. This should be used in conjunction with -update-gateway-weight below.

  • --update-gateway-weight: Define how traffic should be distributed to the newly created revision. This should be used alongside the -update-gateway-weight option mentioned above.

Troubleshooting

  • NotFound (404): Requested entity not found. error while creating Revisions or Gateways via CLI:

    • Use the vessl whoami command to confirm if the default organization matches the one where Serving exists.

    • You can use the vessl configure --reset command to change the default organization.

    • Ensure that Serving is properly created within the selected default organization.

  • What's the difference between Gateway and Endpoint?

    • There is no difference between the two terms; they refer to the same concept.

    • To prevent confusion, these terms will be unified under "Endpoint" in the future.

  • HPA Scale-in/Scale-out Approach:

    • As an example of how it works based on CPU metrics:

      • Desired replicas = ceil[current replicas * ( current CPU metric value / desired CPU metric value )]

      • The HPA constantly monitors this metric and adjusts the current replicas within the [min, max] range.

Refer to the for detailed information on the YAML manifest schema.

Currently, VESSL Serve operates based on Kubernetes' Horizontal Pod Autoscaler (HPA) and uses its algorithms as is. For detailed information, refer to the .

MNIST example
Model Registry
YAML schema reference
Kubernetes documentation
New Serving
New Serving Dialog