Backend System
Tensor Frame uses a pluggable backend system that allows tensors to run on different computational devices. This page documents the backend architecture and API.
Backend Trait
All backends implement the Backend
trait:
#![allow(unused)] fn main() { pub trait Backend: Debug + Send + Sync { fn backend_type(&self) -> BackendType; fn is_available(&self) -> bool; // Tensor creation fn zeros(&self, shape: &Shape, dtype: DType) -> Result<Storage>; fn ones(&self, shape: &Shape, dtype: DType) -> Result<Storage>; fn from_slice(&self, data: &[f32], shape: &Shape) -> Result<Storage>; // Arithmetic operations fn add(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>; fn sub(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>; fn mul(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>; fn div(&self, lhs: &Storage, rhs: &Storage) -> Result<Storage>; // Reduction operations fn sum(&self, storage: &Storage, axis: Option<usize>) -> Result<Storage>; fn mean(&self, storage: &Storage, axis: Option<usize>) -> Result<Storage>; // Data access fn to_vec_f32(&self, storage: &Storage) -> Result<Vec<f32>>; } }
Storage Types
Each backend uses a different storage mechanism:
#![allow(unused)] fn main() { pub enum Storage { Cpu(Vec<f32>), // CPU: simple Vec Wgpu(WgpuStorage), // WGPU: GPU buffer Cuda(CudaStorage), // CUDA: device pointer } pub struct WgpuStorage { pub buffer: Arc<wgpu::Buffer>, // WGPU buffer handle } pub struct CudaStorage { pub ptr: *mut f32, // Raw CUDA device pointer pub len: usize, // Buffer length } }
Backend Selection
Automatic Selection
By default, Tensor Frame automatically selects the best available backend:
- CUDA (if available and feature enabled)
- WGPU (if available and feature enabled)
- CPU (always available)
#![allow(unused)] fn main() { // Uses automatic backend selection let tensor = Tensor::zeros(vec![1000, 1000])?; println!("Selected backend: {:?}", tensor.backend_type()); }
Manual Selection
You can also explicitly specify backend priority:
#![allow(unused)] fn main() { use tensor_frame::backend::{set_backend_priority, BackendType}; // Force CPU backend let cpu_backend = set_backend_priority(vec![BackendType::Cpu]); // Prefer WGPU over CUDA let gpu_backend = set_backend_priority(vec![ BackendType::Wgpu, BackendType::Cuda, BackendType::Cpu ]); }
Backend Conversion
Convert tensors between backends:
#![allow(unused)] fn main() { let cpu_tensor = Tensor::ones(vec![100, 100])?; // Convert to GPU backend (if available) let gpu_tensor = cpu_tensor.to_backend(BackendType::Wgpu)?; // Convert back to CPU let back_to_cpu = gpu_tensor.to_backend(BackendType::Cpu)?; }
Performance Characteristics
CPU Backend
- Pros: Always available, good for small tensors, excellent for development
- Cons: Limited parallelism, slower for large operations
- Best for: Tensors < 10K elements, prototyping, fallback option
- Implementation: Uses Rayon for parallel CPU operations
WGPU Backend
- Pros: Cross-platform GPU support, works on Metal/Vulkan/DX12/OpenGL
- Cons: Compute shader overhead, limited by GPU memory
- Best for: Large tensor operations, cross-platform deployment
- Implementation: Compute shaders with buffer storage
CUDA Backend
- Pros: Highest performance on NVIDIA GPUs, mature ecosystem
- Cons: NVIDIA-only, requires CUDA toolkit installation
- Best for: Production workloads on NVIDIA hardware
- Implementation: cuBLAS and custom CUDA kernels
Backend Availability
Check backend availability at runtime:
#![allow(unused)] fn main() { use tensor_frame::backend::{cpu, wgpu, cuda}; // CPU backend is always available println!("CPU available: {}", cpu::CpuBackend::new().is_available()); // Check GPU backends #[cfg(feature = "wgpu")] if let Ok(wgpu_backend) = wgpu::WgpuBackend::new() { println!("WGPU available: {}", wgpu_backend.is_available()); } #[cfg(feature = "cuda")] println!("CUDA available: {}", cuda::is_available()); }
Cross-Backend Operations
Operations between tensors on different backends automatically handle conversion:
#![allow(unused)] fn main() { let cpu_tensor = Tensor::ones(vec![100])?; let gpu_tensor = Tensor::zeros(vec![100])?.to_backend(BackendType::Wgpu)?; // Automatically converts gpu_tensor to CPU backend for the operation let result = cpu_tensor + gpu_tensor; }
Custom Backends
You can implement custom backends by implementing the Backend
trait:
#![allow(unused)] fn main() { #[derive(Debug)] struct MyCustomBackend; impl Backend for MyCustomBackend { fn backend_type(&self) -> BackendType { // Would need to extend BackendType enum BackendType::Custom } fn is_available(&self) -> bool { true // Your availability logic } // Implement all required methods... fn zeros(&self, shape: &Shape, dtype: DType) -> Result<Storage> { // Your implementation } // ... more methods } }
Memory Management
Reference Counting
- Tensors use
Arc<dyn Backend>
for backend sharing - Storage is reference counted within each backend
- Automatic cleanup when last reference is dropped
Cross-Backend Memory
- Converting between backends allocates new memory
- Original data remains valid until all references dropped
- No automatic synchronization between backends
GPU Memory Management
- WGPU backend uses WGPU's automatic memory management
- CUDA backend manually manages device memory with proper cleanup
- Out-of-memory errors are propagated as
TensorError::BackendError