Why Rust for WASM?
- No garbage collection overhead
- Zero-cost abstractions
- Predictable performance
- Small binary size
Build Optimization
Cargo.toml Settings
[profile.release]
opt-level = 3 # Maximum optimization
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit
panic = "abort" # Smaller binaries
wasm-pack Flags
wasm-pack build --release --target web
Code Patterns
Minimize Allocations
// Bad: Allocates on every call
fn process(data: &[u8]) -> Vec<u8> {
data.iter().map(|x| x * 2).collect()
}
// Good: Reuse buffer
fn process_into(data: &[u8], out: &mut Vec<u8>) {
out.clear();
out.extend(data.iter().map(|x| x * 2));
}
Use SIMD When Available
#[cfg(target_arch = "wasm32")]
use std::arch::wasm32::*;
fn sum_f32x4(a: &[f32]) -> f32 {
let mut sum = f32x4_splat(0.0);
for chunk in a.chunks_exact(4) {
let v = f32x4(chunk[0], chunk[1], chunk[2], chunk[3]);
sum = f32x4_add(sum, v);
}
f32x4_extract_lane::<0>(sum) +
f32x4_extract_lane::<1>(sum) +
f32x4_extract_lane::<2>(sum) +
f32x4_extract_lane::<3>(sum)
}
Memory Layout
// Prefer arrays over vectors when size known
struct TensorData {
data: [f32; 1024], // Stack allocated
len: usize,
}
Binary Size Reduction
| Technique |
Size Reduction |
| opt-level=z |
~10-15% |
| LTO |
~10-20% |
| wasm-opt -Oz |
~5-10% |
| Remove debug |
~30-50% |
Using wasm-opt
wasm-opt -Oz -o optimized.wasm input.wasm
console.time("wasm_call");
wasmInstance.exports.heavy_computation();
console.timeEnd("wasm_call");
Rust Benchmarks
#[bench]
fn bench_process(b: &mut Bencher) {
let data = vec![0u8; 1000];
b.iter(|| process(&data));
}
Common Pitfalls
- String handling: UTF-8 validation is expensive
- Memory copies: Avoid unnecessary JS↔WASM copies
- Small functions: Call overhead adds up
- Bounds checking: Use
get_unchecked when safe