Ray: Beyond Serverless Computing

Series Outline

Note: this is part of an ongoing series of posts about Ray. The full series is:

  1. Ray: A Distributed Computing Platform For Machine Learning
  2. Ray’s Ecosystem
  3. Ray: Core Architecture
  4. This post, where we show how Ray improves upon Serverless Computing


Serverless Computing

Most cloud providers have a Serverless Computing offering, also known as Function as a Service. These include AWS Lambda, Google Cloud Functions, and Azure Functions. Serverless computing  has dramatically simplified cloud applications,  allowing developers to deploy stateless code*, called as a function or as an executable, directly to the cloud.  Application developers no longer have to provision, manage, and scale virtual machine instances.

 

Limitations of Serverless Computing

Serverless computing has not caught on in the machine learning space. This is due to several reasons:

  1. Many machine learning tasks run for a relatively long time (e.g. hours),  while serverless computing is optimized for many short lived requests. For example, the longest possible execution time for AWS Lambda is 15 minutes.
  2. Although many machine learning tasks can be implemented in stateless manner, they may still involve expensive state loading operations (e.g. loading data sets or models) that are best amortized over multiple requests.
  3. Specialized hardware resources such as GPUs may need to be allocated to each function execution.
  4. Finally, some machine learning tasks, and many data preparation tasks, involve more complex and stateful communication patterns not representable in the Serverless model.


Ray and Its Programming Model

Ray is designed to overcome these limitations while preserving the high level of abstraction over machine resources seen in serverless computing. For the application programmer, Ray provides two key abstractions: Tasks, which are stateless functions in the serveless model, and Actors, which are stateful objects. For more details, see my previous blog post on Ray’s architecture.

Comparing System Architectures

The picture above compares a typical system architecture for Serverless Computing and Ray. Serverless functions typically reside between the public-facing ingress points and backend stateful stores and/or applications. These functions route and coordinate calls between the stateful backend services. In a Ray-based architecture, Tasks can serve an equivalent role. In addition, Actors can be stateful, allowing more complex interactions in the middle tier or application-controlled caching of state. Ray Serve assumes the role taken by an API Gateway in a Serverless architecture — it provides HTTP front-ends to Ray Tasks and Actors. Finally, Ray Tasks and Actors can be associated with hardware resources, such as GPUs, a capability unavailable in Serverless Computing platforms. 

Addressing the Limitations of Serverless Computing

Let us look at how Ray addresses the four issues we identified in supporting machine learning applications via serverless computing:

  1. Since a Ray cluster runs on your own hardware or is provisioned at the node level, there are no limitations to the execution time of individual Tasks or Actors. Ray also provides Placement Groups, a mechanism to help in group scheduling. This may be useful when the same cluster runs both long-lived and short-lived jobs.
  2. Since Ray Actors are stateful, expensive state loading operations can be performed when an Actor is initialized. And then, individual requests are serviced by Actor method calls. Hence, expensive initialization needs to be performed only once per actor. In some situations, it may be possible to store the associated state in the Object Store, allowing it to be reused across many Actors or Tasks.
  3. Ray can discover GPUs and other hardware resources available on a node. Each Task or Actor can declare what (hardware) resources it needs. When finding a node to run a given Task/Actor, Ray will pick a node that has the required resource. The Raylet on each node is responsible for allocating these resources and ensuring that each resource (e.g. GPU) is made available to only one Task/Actor at a time.
  4. Ray’s Actors can support complex and stateful interaction patterns not expressible in serverless computing. For example, it is straightforward to implement common distributed data processing algorithms like Map-Reduce on top of Ray.

In conclusion, Ray goes beyond serverless computing to support scenarios important to machine learning applications, while still providing a very high level programming API.

In coming blog posts, I will look at some example Ray applications in more detail. Stay tuned for more!

*:By stateless, I mean that no information is retained between function calls. Stateless code is much easier to deploy and scale than stateful code. Of course, the underlying platforms may do caching of data behind the scenes.