mosec version 0.2.0

MOSEC

Model Serving made Efficient in the Cloud.

Introduction

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

Highly performant: web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O
Ease of use: user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing
Dynamic batching: aggregate requests from different users for batched inference and distribute results back
Pipelined stages: spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads

Installation

Mosec requires Python 3.6 or above. Install the latest PyPI package with:

pip install -U mosec

Usage

Write the server

Import the libraries and set up a basic logger to better observe what happens. ```python import logging

from mosec import Server, Worker from mosec.errors import ValidationError

logger = logging.getLogger() logger.setLevel(logging.DEBUG) formatter = logging.Formatter( "%(asctime)s - %(process)d - %(levelname)s - %(filename)s:%(lineno)s - %(message)s" ) sh = logging.StreamHandler() sh.setFormatter(formatter) logger.addHandler(sh) ```

Then, we build an API to calculate the exponential with base e for a given number. To achieve that, we simply inherit the Worker class and override the forward method. Note that the input req is by default a JSON-decoded object, e.g., a dictionary here (because we design it to receive data like {"x": 1}). We also enclose the input parsing part with a try...except... block to reject invalid input (e.g., no key named "x" or field "x" cannot be converted to float). ```python import math

class CalculateExp(Worker): def forward(self, req: dict) -> dict: try: x = float(req["x"]) except KeyError: raise ValidationError("cannot find key 'x'") except ValueError: raise ValidationError("cannot convert 'x' value to float") y = math.exp(x) # f(x) = e ^ x logger.debug(f"e ^ {x} = {y}") return {"y": y} ```

Finally, we append the worker to the server to construct a single-stage workflow, with specifying how many processes we want it to run in parallel. Then we run the server. ```python if name == "main": server = Server() server.append_worker( CalculateExp, num=2 ) # we spawn two processes for parallel computing server.run()

```

Run the server

After merging the snippets above into a file named server.py, we can first have a look at the supported arguments:

python server.py --help

Then let's start the server...

python server.py

and test it:

curl -X POST http://127.0.0.1:8000/inference -d '{"x": 2}'

That's it! You have just hosted your exponential-computing model as a server! 😉

Example

More ready-to-use examples can be found in the Example section. It includes: - Multi-stage workflow - Batch processing worker - PyTorch deep learning models - sentiment analysis - image recognition

Contributing

We welcome any kind of contribution. Please give us feedback by raising issues or directly contribute your code and pull request!