Backends Overview
Tensor Frame's backend system provides a pluggable architecture for running tensor operations on different computational devices. This allows the same high-level tensor API to transparently utilize CPU cores, integrated GPUs, discrete GPUs, and specialized accelerators.
Available Backends
Backend | Feature Flag | Availability | Best Use Cases |
---|---|---|---|
CPU | cpu (default) | Always | Small tensors, development, fallback |
WGPU | wgpu | Cross-platform GPU | Large tensors, cross-platform deployment |
CUDA | cuda | NVIDIA GPUs | High-performance production workloads |
Backend Selection Strategy
Automatic Selection (Recommended)
By default, Tensor Frame automatically selects the best available backend using this priority order:
- CUDA - Highest performance on NVIDIA hardware
- WGPU - Cross-platform GPU acceleration
- CPU - Universal fallback
#![allow(unused)] fn main() { use tensor_frame::Tensor; // Automatically uses best available backend let tensor = Tensor::zeros(vec![1000, 1000])?; println!("Using backend: {:?}", tensor.backend_type()); }
Manual Backend Control
For specific requirements, you can control backend selection:
#![allow(unused)] fn main() { use tensor_frame::backend::{set_backend_priority, BackendType}; // Force CPU-only execution let backend = set_backend_priority(vec![BackendType::Cpu]); // Prefer WGPU over CUDA let backend = set_backend_priority(vec![ BackendType::Wgpu, BackendType::Cuda, BackendType::Cpu ]); }
Per-Tensor Backend Conversion
Convert individual tensors between backends:
#![allow(unused)] fn main() { let cpu_tensor = Tensor::ones(vec![100, 100])?; // Move to GPU let gpu_tensor = cpu_tensor.to_backend(BackendType::Wgpu)?; // Move back to CPU let back_to_cpu = gpu_tensor.to_backend(BackendType::Cpu)?; }
Performance Characteristics
CPU Backend
- Latency: Lowest for small operations (< 1ms)
- Throughput: Limited by CPU cores and memory bandwidth
- Memory: System RAM (typically abundant)
- Parallelism: Thread-level via Rayon
- Overhead: Minimal function call overhead
WGPU Backend
- Latency: Higher initialization cost (~1-10ms)
- Throughput: High for large, parallel operations
- Memory: GPU memory (limited but fast)
- Parallelism: Massive thread-level via compute shaders
- Overhead: GPU command submission and synchronization
CUDA Backend
- Latency: Moderate initialization cost (~0.1-1ms)
- Throughput: Highest for supported operations
- Memory: GPU memory with CUDA optimizations
- Parallelism: Optimal GPU utilization via cuBLAS/cuDNN
- Overhead: Minimal with mature driver stack
When to Use Each Backend
CPU Backend
#![allow(unused)] fn main() { // Good for: let small_tensor = Tensor::ones(vec![10, 10])?; // Small tensors let dev_tensor = Tensor::zeros(vec![100])?; // Development/testing let scalar_ops = tensor.sum(None)?; // Scalar results // Avoid for: // - Large matrix multiplications (> 1000x1000) // - Batch operations on many tensors // - Compute-intensive element-wise operations }
WGPU Backend
#![allow(unused)] fn main() { // Good for: let large_tensor = Tensor::zeros(vec![2048, 2048])?; // Large tensors let batch_ops = tensors.iter().map(|t| t * 2.0); // Batch operations let element_wise = (a * b) + c; // Complex element-wise // Consider for: // - Cross-platform deployment // - When CUDA is not available // - Mixed CPU/GPU workloads }
CUDA Backend
#![allow(unused)] fn main() { // Excellent for: let huge_tensor = Tensor::zeros(vec![4096, 4096])?; // Very large tensors let matrix_mul = a.matmul(&b)?; // Matrix operations let ml_workload = model.forward(input)?; // ML training/inference // Best when: // - NVIDIA GPU available // - Performance is critical // - Using alongside other CUDA libraries }
Cross-Backend Operations
Operations between tensors on different backends automatically handle conversion:
#![allow(unused)] fn main() { let cpu_a = Tensor::ones(vec![1000])?; let gpu_b = Tensor::zeros(vec![1000])?.to_backend(BackendType::Wgpu)?; // Automatically converts to common backend let result = cpu_a + gpu_b; // Runs on CPU backend }
Conversion Rules:
- If backends match, operation runs on that backend
- If backends differ, converts to the "lower priority" backend
- Priority order: CPU > WGPU > CUDA (CPU is most compatible)
Memory Management
Reference Counting
All backends use reference counting for efficient memory management:
#![allow(unused)] fn main() { let tensor1 = Tensor::ones(vec![1000, 1000])?; let tensor2 = tensor1.clone(); // O(1) - just increments reference count // Memory freed automatically when last reference dropped }
Cross-Backend Memory
Converting between backends allocates new memory:
#![allow(unused)] fn main() { let cpu_tensor = Tensor::ones(vec![1000])?; // 4KB CPU memory let gpu_tensor = cpu_tensor.to_backend(BackendType::Wgpu)?; // +4KB GPU memory // Both tensors exist independently until dropped }
Memory Usage Guidelines
- Development: Use CPU backend to avoid GPU memory pressure
- Production: Convert to GPU early, minimize cross-backend copies
- Mixed workloads: Keep frequently-accessed tensors on CPU
- Large datasets: Stream data through GPU backends
Error Handling
Backend operations can fail for various reasons:
#![allow(unused)] fn main() { match Tensor::zeros(vec![100000, 100000]) { Ok(tensor) => println!("Created tensor on {:?}", tensor.backend_type()), Err(TensorError::BackendError(msg)) => { eprintln!("Backend error: {}", msg); // Fallback to smaller size or different backend } Err(e) => eprintln!("Other error: {}", e), } }
Common Error Scenarios:
- GPU Out of Memory: Try smaller tensors or CPU backend
- Backend Unavailable: Fallback to CPU backend
- Feature Not Implemented: Some operations only available on certain backends
- Cross-Backend Type Mismatch: Ensure compatible data types
Backend Implementation Status
Operation | CPU | WGPU | CUDA |
---|---|---|---|
Basic arithmetic (+, -, *, /) | ✅ | ✅ | ✅ |
Reductions (sum, mean) | ✅ | ❌ | ✅ |
Reshape, transpose | ✅ | ✅ | ✅ |
Broadcasting | ✅ | ✅ | ✅ |
✅ = Fully implemented
❌ = Not yet implemented
⚠️ = Partially implemented