Monitoring Dashboard
Last updated
Last updated
The Monitor tab within VESSL serves as a comprehensive dashboard, enabling ML teams to efficiently monitor and manage model services. This tab presents essential information at a glance, catering to the operational needs of the teams involved.
The dashboard is divided into five sections, each providing a different perspective on the service's status and performance. These sections are:
Monitor
: A hostmap view of the current state of the service's revisions and replicas.
Metadata
: Detailed information about the service, including its status, the number of revisions and replicas, and the latest update.
Endpoint
: The service's endpoint and traffic distribution information.
When you click the 'Edit' button, you can change the configuration of the endpoint.
See section below for more details.
Workloads
: Detailed information about each replica's status.
See section below for more details.
Metrics
: The service's key metrics (CPU/GPU/RAM usage, Replica numbers, network, request throughput, error rate, etc.) in a timeseries graph.
When you click the 'Edit' button in the Endpoint section, you can change the configuration of the endpoint.
Enable Endpoint
: Decide whether to actually create an Endpoint in the cluster.
Host
: Set the custom domain name to connect to the Endpoint. If left blank, the cluster will automatically generate the name of the Load balancer endpoint.
Revisions
: Select the revisions to connect to the endpoint.
Set the revision number, port, and traffic weight to connect to the endpoint.
The total of the traffic weight of all revisions connected must be 100%.
Advanced Settings: Various advanced options related to the cluster that operates the model service. Change this setting only if you know exactly what each setting does.
Ingress Class (Advanced Settings)
: Set the Ingress Class to use. In AWS clusters, set it to alb
.
Annotation (Advanced Settings)
: Kubernetes annotation to inform the load balancer controller, etc. when setting up the endpoint.
Replica refers to the container that actually serves each model server in the cluster. In the cluster, it exists in the form of a Kubernetes pod. VESSL supports the ability to directly check the list of replicas in service and take necessary actions.
Delete
: Delete the replica. If the replica is deleted, the cluster will automatically create a new replica to replace it.
Log
: View the logs of the replica.
Metrics
: Filter the metrics of the replica on the dashboard graphs.
To connect a custom domain name to the endpoint, you must set up a DNS Service such as AWS Route53 to control the DNS from the cluster. —