Ray’s Ecosystem

As part of our blog series on Ray, this post analyzes the ecosystem that Ray has built around its platform. If you missed our first blog post on Ray, you might read it first.

 

Ray is a relatively young open source project, created in 2016 as part of the research project from UC Berkeley. Nevertheless, Ray has created an impressive ecosystem around its platform. The graph below shows the links between Ray and key projects in its ecosystem.

As illustrated above, 10 open source projects have already built integrations with Ray. This includes large projects, like spaCy (850k lines committed last year), Hugging Face Transformers (3.5M lines), and Horovod (850k lines). Because Ray has been focused on ML (Machine Learning) pipelines, its ecosystem has many links to the most popular machine learning core frameworks: PyTorch, Scikit-learn, and TensorFlow. In addition, Intel has built two separate integrations between Ray and the most widely-used big data framework, Spark.
Table 1: Ray Ecosystem by Machine Learning Area
Machine Learning AreaProjects
Feature EngineeringDask, Mars, Modin, RayDP, Spark
Distributed TrainingHorovod
Computer VisionClassyVision
Natural Language Processing (NLP)Hugging Face Transformers, spaCy
ExplainabilitySeldon Alibi
Core Machine LearningPyTorch, Scikit-learn, TensorFlow
Data RepresentationNumPy, Pandas
End-to-End MLAnalytics Zoo, Flambe, PyCaret

Table 1 above shows the integrations organized by their application area. Not only does Ray integrate with the core ML frameworks, but it also has integrations with each step in the ML pipeline. Ray has become the center of an interoperable ecosystem for distributed machine learning.