vessl experiment
Overview
Run vessl experiment --help
to view the list of commands, vessl experiment [COMMAND] -help
to view individual command instructions.
Create an experiment
-c
, --cluster
Cluster name (must be specified before other options)
-x
, --command
Start command to execute in experiment container
-r
, --resource
Resource type to run an experiment (for managed cluster only)
--processor-type
CPU
or GPU
(for custom cluster only)
--cpu-limit
Number of vCPUs (for custom cluster only)
--memory-limit
Memory limit in GiB (for custom cluster only)
--gpu-type
GPU type (for custom cluster only)
ex. Tesla-K80
--gpu-limit
Number of GPU cores (for custom cluster only)
--image-url
Kernel docker image URL.
--upload-local-file
(multiple)
Upload local file. Format: [local_path] or [local_path]:[remote_path].
ex. --upload-local-file my-project:/root/my-project
--upload-local-git-diff
Upload local git commit hash and diff (only works in project repositories)
-i
, --image-url
Kernel docker image URL
ex. vessl/kernels:py36.full-cpu
-m
, --message
Message
--termination-protection
Enable termination protection
-h
, --hyperparameter
(multiple)
Hyperparameters in the form of [key]=[value]
ex. -h lr=0.01 -h epochs=100
--dataset
(multiple)
Dataset mounts in the form of [mount_path] [dataset_name]
ex. --dataset /input mnist
--root-volume-size
Root volume size (defaults to 20Gi
)
--working-dir
Working directory path (defaults to /root/
)
--output-dir
Output directory path (defaults to /output
--local-project
Local project file URL
--worker-count
Number of workers (for distributed experiment only)
--framework-type
Specify pytorch
or tensorflow
(for distributed experiment only)
Download experiment output files
Each user can define experiment output files. You can save validation results, trained checkpoints, best performing models and other artifacts.
NUMBER
Experiment number
-p
, --path
Local download path (defaults to./output
)
--worker-number
Worker number (for distributed experiment only)
List all experiments
List experiment output files
Each user can define experiment output files. You can save validation results, trained checkpoints, best models, and other artifacts.
NUMBER
Experiment number
-r
, --recursive
List files recursively
--worker-number
Worker number (for distributed experiment only)
View logs of the experiment container
NUMBER
Experiment number
--tail
Number of lines to display from the end (defaults to 200)
--worker-number
Worker number (for distributed experiment only)
View information on the experiment
NUMBER
Experiment number
Terminate an experiment
NUMBER
Experiment number
Last updated