custos logo


Crates.io version Docs Rust GPU rust-clippy

A minimal OpenCL, WGPU, CUDA and host CPU array manipulation engine / framework written in Rust. This crate provides the tools for executing custom array and automatic differentiation operations with the CPU, as well as with CUDA, WGPU and OpenCL devices.
This guide demonstrates how operations can be implemented for the compute devices: implement_operations.md
or to see it at a larger scale, look here custos-math or here sliced (for automatic diff examples).

Installation

Add "custos" as a dependency: ```toml [dependencies] custos = "0.7.0"

to disable the default features (cpu, cuda, opencl, static-api, blas, macro) and use an own set of features:

custos = {version = "0.7.0", default-features=false, features=["opencl", "blas"]}

```

Available features:

Feature | Description --- | --- cpu | Adds the CPU device stack | Adds the Stack device, enables stack allocated Buffers opencl | Adds OpenCL features. (name of the device: OpenCL) cuda | Adds CUDA features. (name of the device: CUDA) wgpu | Adds WGPU features. (name of the device: WGPU) no-std | For no std environments, activates stack feature. static-api | Enables the creation of Buffers without providing a device. blas | Adds gemm functions from the system's (selected) BLAS library. opt-cache | Makes the 'cache graph' optimizeable, lowering the memory footprint. macro | Reexport of [custos-macro] realloc | Disables allocation caching for all devices. autograd | Adds automatic differentiation features.

[Examples]

custos only implements four Buffer operations. These would be the write, read, copy_slice and clear operations, however, there are also [unary] (device only) operations.
On the other hand, [custos-math] implements a lot more operations, including Matrix operations for a custom Matrix struct.

Implement an operation for CPU: If you want to implement your own operations for all compute devices, consider looking here: implement_operations.md

```rust use std::ops::Mul; use custos::prelude::*;

pub trait MulBuf: Sized + Device { fn mul(&self, lhs: &Buffer, rhs: &Buffer) -> Buffer; }

impl MulBuf for CPU where T: Mul + Copy, S: Shape, D: MainMemory, { fn mul(&self, lhs: &Buffer, rhs: &Buffer) -> Buffer { let mut out = self.retrieve(lhs.len(), (lhs, rhs));

    for ((lhs, rhs), out) in lhs.iter().zip(&*rhs).zip(&mut out) {
        *out = *lhs * *rhs;
    }

    out
}

} ```

A lot more usage examples can be found in the [tests] and [examples] folders. (Or in the [unary] operation file, custos-math and sliced)