moc-ray-demo
A demonstration of using the Ray distributed computing platform with Jupyter on Open Data Hub.
Prerequisites
Running Ray with Jupyter on Open Data Hub assumes you have a working Ray+ODH integration, as described in this ray-odh-demo.
The Open Data Hub running on the Massachussets Open Cloud currently supports Ray on ODH.
To run the test notebook:
This “smoke-test” notebook shows how to connect to a ray cluster from jupyter in ODH, and run a basic computation.
- Log into the ODH JupyterHub launcher. You should see a ray-enabled notebook image option such as
ray-ml-notebook
: choose this. - In the environment variable section, add
JUPYTER_PRELOAD_REPOS
and set tohttps://github.com/erikerlandson/moc-ray-demo.git
- Launch your JupyterHub environment. As part of the startup process, the ODH JupyterHub launcher should also start up a Ray cluster.
- In Jupyter, navigate to directory
moc-ray-demo.git/source
and open the “smoke-test” notebook. - Run the notebook cells to confirm that it connects to your ray cluster and operates correctly.
- If the connection to the Ray cluster results in a timeout, wait a minute and re-try.
Caveat: The demo function in this test notebook reports the names of the ray cluster nodes it used to compute. The Ray operator adaptively scales the clusters it is managing based on workload. When running the demo function, it may not use all available nodes depending on how many were spun up at the time, and how long it took to run its computations. Repeatedly running the test frequently causes more nodes to report in the result.
Doing ML with Ray and Jupyter
I will be adding a demonstration of basic ML workloads, using ray and jupyter notebooks, to this section soon.
Caveats and Limitations
- Ray support for Jupyter and ODH is currently experimental, and based on pre-release development versions of Ray 2.0
- Currently a hard maximum of 5 Ray worker nodes is configured.
- Ray nodes are configured for 1 CPU and 1 GiB of RAM.