Ray: A Distributed Computing Platform For Machine Learning

Ray is an open source project originating from the UC Berkeley RISELab in 2016. The creators of Ray launched a commercial company, Anyscale, in 2019. The Ray project has been a superstar from its inception: it received two NSF grants and sponsorships from Alibaba, Amazon Web Services, Ant Financial, ARM, CapitalOne, Ericsson, Facebook, Google, Huawei, Intel, Microsoft, Sco-tiabank, Splunk, and VMware. Without any surprise, Anyscale successfully raised $60M from two rounds of funding, $20M in Dec 2019 and $40M in Oct 2020. 

What is Ray?

Ray was created to solve difficulties in automating end-to-end reinforcement learning (RL) in Machine Learning workflows. The Ray creators started by using Spark for its distributed capability. However, they soon found that Spark could not meet their requirements: a combination of stateless services for low-latency processing and stateful services for training. They created Ray which has Tasks to process stateless requests and Actors for stateful computation. Users can run Tasks and Actors on both local and remote machines. 

Why do you care? 

Although Ray was created to address Reinforcement Learning workflow problems, it can be used in any distributed workflow. Existing platforms cannot fulfill both stateless and stateful requirements at high throughput. In addition, Ray is integrated with many machine learning and data science projects, including ClassyVision, Dask, Flambe, Horovod, Hugging Face Transformers, Intel Analytics Zoo, MARS, Modin, PyCaret, Scikit-learn, Seldon Alibi, and spaCy.  

 

Ray is a relatively young project and changing quickly. However, it does have well known production-level customers such as Ant Financial. In fact, 30% of the Ray source code was contributed by Ant Financial. Given the $60M funding for AnyScale, we expect to hear a lot about Ray in the coming year.