Dask-Yarn deploys Dask on YARN clusters, such as are found in traditional Hadoop installations. Dask-Yarn provides an easy interface to quickly start, scale, and stop Dask clusters natively from Python.
from dask_yarn import YarnCluster from dask.distributed import Client # Create a cluster where each worker has two cores and eight GiB of memory cluster = YarnCluster(environment='environment.tar.gz', worker_vcores=2, worker_memory="8GiB") # Scale out to ten such workers cluster.scale(10) # Connect to the cluster client = Client(cluster)
Dask-Yarn uses Skein, a Pythonic library to create and deploy YARN applications.
Dask-Yarn is designed to only require installation on an edge node. To install, use one of the following methods:
Install with Conda:
conda install -c conda-forge dask-yarn
Install with Pip:
pip install dask-yarn
Install from Source:
Dask-Yarn is available on github and can always be installed from source.
pip install git+https://github.com/dask/dask-yarn.git