Quickstart
Last updated
Last updated
This document provides a quickstart guide of VESSL Serve - managing revisions and the gateway using YAML manifests.
Prepare the model and service for deployment. In this document, we will use the where you can train a model and register it to the VESSL Model Registry.
Use the following command in the CLI to proceed:
Create a serving instance for deployment. Navigate to the 'Serving' section in the VESSL Web Console and click the 'New Serving' button. This will allow you to create a serving named mnist-example
.
Write manifest file for serving revision
Create a new serving revision. Save the following content as a file named serve-revision.yaml
:
You can easily deploy the revision defined in YAML using the VESSL CLI as shown below:
Ensure that you specify a container image with the same Python version as used during model creation. For instance, if you trained the model with Python 3.8, it's recommended to use an image containing Python 3.8, such as quay.io/vessl-ai/kernels:py38-202308150329.
To perform inference with the created revision, it's necessary to expose it to the external network. in VESSL Seriving, Gateway(Endpoint) determines how traffic is routed and distributed to which port.
Firstly, create a YAML file defining Gateway. Create a file named serve-gateway.yaml
with the following content:
The Gateway can be easily deployed using the VESSL CLI, as shown below:
To check the status of the deployed Gateway, use the vessl serve gateway show command.
You can check the status of the deployed Gateway as shown below:
To deploy a new version of the model without interrupting the service, a process is required where the new version is deployed first, followed by a gradual transition of traffic.
In VESSL Serve, the Gateway (Endpoint) provides the capability to distribute traffic across multiple Revisions.
Begin by defining and deploying the new Revision.
Subsequently, modify the serve-gateway.yaml
to split traffic to the new Revision.
Update the Gateway configuration with the provided settings:
Executing this command will display the Gateway's status, revealing the distribution of traffic across the specified Revisions.
After defining a Revision using YAML, you can create the revision and launch the gateway simultaneously by providing parameters directly in the CLI. Here's an example of the CLI command:
By using the --update-gateway
option, you can update the gateway (endpoint) simultaneously while creating a revision. The following options can be used in conjunction:
--enable-gateway-if-off
: This option changes the gateway's status to "enabled" if it's currently disabled.
--update-gateway-port
: Specify the port to be used by the newly created revision. This should be used in conjunction with -update-gateway-weight below.
--update-gateway-weight
: Define how traffic should be distributed to the newly created revision. This should be used alongside the -update-gateway-weight option mentioned above.
NotFound (404): Requested entity not found
. error while creating Revisions or Gateways via CLI:
Use the vessl whoami
command to confirm if the default organization matches the one where Serving exists.
You can use the vessl configure --reset
command to change the default organization.
Ensure that Serving is properly created within the selected default organization.
What's the difference between Gateway and Endpoint?
There is no difference between the two terms; they refer to the same concept.
To prevent confusion, these terms will be unified under "Endpoint" in the future.
HPA Scale-in/Scale-out Approach:
As an example of how it works based on CPU metrics:
Desired replicas = ceil[current replicas * ( current CPU metric value / desired CPU metric value )]
The HPA constantly monitors this metric and adjusts the current replicas within the [min, max] range.
Refer to the for detailed information on the YAML manifest schema.
Currently, VESSL Serve operates based on Kubernetes' Horizontal Pod Autoscaler (HPA) and uses its algorithms as is. For detailed information, refer to the .