[DEV] VESSL Docs
  • Welcome to VESSL Docs!
  • GETTING STARTED
    • Overview
    • Quickstart
    • End-to-end Guides
      • CLI-driven Workflow
      • SDK-driven Workflow
  • USER GUIDE
    • Organization
      • Creating an Organization
      • Organization Settings
        • Add Members
        • Set Notifications
        • Configure Clusters
        • Add Integrations
        • Billing Information
    • Project
      • Creating a Project
      • Project Overview
      • Project Repository & Project Dataset
    • Clusters
      • Cluster Integrations
        • Fully Managed Cloud
        • Personal Laptops
        • On-premise Clusters
        • Private Cloud (AWS)
      • Cluster Monitoring
      • Cluster Administration
        • Resource Specs
        • Access Control
        • Quotas and Limits
        • Remove Cluster
    • Dataset
      • Adding New Datasets
      • Managing Datasets
      • Tips & Limitations
    • Experiment
      • Creating an Experiment
      • Managing Experiments
      • Experiment Results
      • Distributed Experiments
      • Local Experiments
    • Model Registry
      • Creating a Model
      • Managing Models
    • Sweep
      • Creating a Sweep
      • Sweep Results
    • Workspace
      • Creating a Workspace
      • Exploring Workspaces
      • SSH Connection
      • Downloading / Attaching Datasets
      • Running a Server Application
      • Tips & Limitations
      • Building Custom Images
    • Serve
      • Quickstart
      • Serve Web Workflow
        • Monitoring Dashboard
        • Service Logs
        • Service Revisions
        • Service Rollouts
      • Serve YAML Workflow
        • YAML Schema Reference
    • Commons
      • Running Spot Instances
      • Volume Mount
  • API REFERENCE
    • What is the VESSL CLI/SDK?
    • CLI
      • Getting Started
      • vessl run
      • vessl cluster
      • vessl dataset
      • vessl experiment
      • vessl image
      • vessl model
      • vessl organization
      • vessl project
      • vessl serve
      • vessl ssh-key
      • vessl sweep
      • vessl volume
      • vessl workspace
    • Python SDK
      • Integrations
        • Keras
        • TensorBoard
      • Utilities API
        • configure
        • vessl.init
        • vessl.log
          • vessl.Image
          • vessl.Audio
        • vessl.hp.update
        • vessl.progress
        • vessl.upload
        • vessl.finish
      • Dataset API
      • Experiment API
      • Cluster API
      • Image API
      • Model API
        • Model Serving API
      • Organization API
      • Project API
      • Serving API
      • SSH Key API
      • Sweep API
      • Volume API
      • Workspace API
    • Rate Limits
  • TROUBLESHOOTING
    • GitHub Issues
    • VESSL Flare
Powered by GitBook
On this page
  • Overview
  • Step-by-step Guide
  1. USER GUIDE
  2. Clusters
  3. Cluster Administration

Resource Specs

Configure custom presets for resource usage

PreviousCluster AdministrationNextAccess Control

Last updated 2 years ago

Overview

Under Resource Specs, you can set custom resource presets that users can only select and use to launch ML workloads. You can also specify the priority of the defined options. For example, when you set the resource specs as above users will only be able to select the four options below.

These default options can help admins optimize resource usage by (1) preventing someone from occupying an excessive number of GPUs and (2) preventing unbalanced resource requests which cause skewed resource usage. As for average users, they can simply get going without thinking and configuring the exact number of CPU cores and memories they need to request.

Step-by-step Guide

Click New resource spec and define the following parameters.

  • Name — Set a name for the preset. Use names that well represent the preset like a100-2.mem-16.cpu-6.

  • Processor type — Define the preset by the processor type, either by CPU or GPU.

  • CPU limit — Enter the number of CPUs. For a100-2.mem-16.cpu-6, enter 6.

  • Memory limit — Enter the amount of memory in GB. For a100-2.mem-16.cpu-6, the number would be 16.

  • GPU type — Specify which GPU you are using. You can get this information by using the nvidia-smi command on your server. In the following example, the value is a100-sxm-80gb.

nvidia-smi
Thu Jan 19 17:44:05 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:01:00.0 Off |                    0 |
| N/A   40C    P0    64W / 275W |      0MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
  • GPU limit — Enter the number of GPUs. For gpu2.mem16.cpu6, enter 2. You can also place decimal values if you are using Multi-Instance GPUs (MIG).

  • Priority — Using different values for priority disables FIFO scheduler and assigns workloads according to priority, with lower priority being first. The example preset below always puts workloads running on gpu-1 ahead of any other workloads.

  • Available workloads — Select the type of workloads that can use the preset. With this, you can guide users to use 🗂️ Experiments by preventing them from running ️ Workspaces with 4 or 8 GPUs.