2. Operator Learning
This document provides comprehensive examples demonstrating the use of neural operators for learning mappings between infinite-dimensional function spaces. Neural operators are powerful tools for solving parametric PDEs, surrogate modeling, and physics-informed machine learning tasks.
2.1. Overview
Operator Learning aims to learn mappings between function spaces rather than finite-dimensional vectors. Given input functions (or fields) and their corresponding output functions, neural operators learn the underlying operator \(\mathcal{G}: \mathcal{U} \to \mathcal{V}\) that maps from input function space \(\mathcal{U}\) to output function space \(\mathcal{V}\).
2.1.1. Why Operator Learning?
Traditional neural networks learn point-wise mappings. Operator learning provides several advantages:
Discretization Invariance: Train on one mesh, evaluate on another
Generalization: Learn families of solutions parameterized by input functions
Efficiency: Solve parametric PDEs without multiple FEM/FDM simulations
Physical Insight: Capture underlying operator structure
2.1.2. Available Implementations
AI4Plasma provides two operator learning architectures:
DeepONet: Universal approximator for nonlinear operators
DeepCSNet: Specialized network for electron-impact cross section prediction
2.2. Common Setup
All scripts can be executed from the repository root directory. Most examples share common features:
Device Selection: Automatically choose CPU/GPU via
ai4plasma.config.DEVICEReproducibility: Fix random seeds via
ai4plasma.utils.common.set_seedPerformance Metrics: Report relative L2 error for validation
Timing: Monitor training and inference time using
ai4plasma.utils.common.Timer
2.2.1. Hardware Configuration
from ai4plasma.utils.device import check_gpu
from ai4plasma.config import DEVICE
if check_gpu(print_required=True):
DEVICE.set_device(0) # Use first GPU
else:
DEVICE.set_device(-1) # Fall back to CPU
2.3. DeepONet (Deep Operator Network)
2.3.1. Theory
DeepONet [1] learns operators \(\mathcal{G}: u(\cdot) \to \mathcal{G}(u)(\cdot)\) by leveraging the universal approximation theorem for operators. The key insight is to represent the output function as:
where:
\(u\) is the input function (discretized as sensors)
\(y\) is the evaluation location
\(b_k(u)\) are basis functions from the branch network (depend on input function)
\(t_k(y)\) are basis functions from the trunk network (depend on location)
\(b_0\) is a learnable bias term
2.3.2. Architecture
Input Function u --> [Branch Net] --> b = [b₁, b₂, ..., bₚ]
|
| Inner Product
|
Location y --> [Trunk Net] --> t = [t₁, t₂, ..., tₚ]
↓
G(u)(y) = b·t + b₀
Branch Network options:
FNN: For 1D functions or feature vectors
CNN: For 2D/3D field inputs (images, spatial distributions)
Trunk Network:
Typically FNN processing spatial/temporal coordinates
2.3.3. Examples
2.3.3.1. 1. One-Dimensional Poisson Equation
File: app/operator/deeponet/solve_1d_poisson.py
Problem: Learn the solution operator for the 1D Poisson equation family:
with analytical solution \(u(x) = v \sin(\pi x)\).
Features:
Branch input: scalar parameter \(v\) (amplitude)
Trunk input: spatial coordinate \(x\)
Simple FNN networks for both branch and trunk
Ideal for quick testing and verification
Network Configuration:
branch_layers = [1, 10, 10, 10, 10] # Input: v (1D)
trunk_layers = [1, 10, 10, 10, 10] # Input: x (1D)
Training Data:
5 training parameters: \(v \in \{2, 4, 6, 8, 10\}\)
40 spatial evaluation points
Test on unseen \(v = 5.5\)
Run:
python app/operator/deeponet/solve_1d_poisson.py
Expected Output:
Training progress with loss values
Final L2 relative error: typically < 1e-3
Total runtime: ~10-30 seconds (depending on hardware)
2.3.3.2. 2. One-Dimensional Poisson with Batch Training
File: app/operator/deeponet/solve_1d_poisson_batch.py
Purpose: Demonstrate batch-wise training with DataLoader for larger datasets.
Key Differences:
Uses
batch_size=5with PyTorch DataLoaderTraining data: 20 parameters uniformly spaced in \([1, 20]\)
Demonstrates scalability to larger datasets
Features:
Efficient memory management for large datasets
Shuffling and batching capabilities
Same PDE as
solve_1d_poisson.pybut with more training data
Run:
python app/operator/deeponet/solve_1d_poisson_batch.py
2.3.3.3. 3. Two-Dimensional Poisson Equation
File: app/operator/deeponet/solve_2d_poisson.py
Problem: Learn the solution operator for 2D Poisson equation:
with analytical solution \(u(x,y) = v \sin(\pi x)\sin(\pi y)\).
Features:
Branch input: scalar parameter \(v\)
Trunk input: 2D coordinates \((x, y)\)
Tests generalization to higher-dimensional spaces
Evaluation on Cartesian grid (\(32 \times 32\) points)
Network Configuration:
branch_layers = [1, 32, 32, 32] # Input: v (1D parameter)
trunk_layers = [2, 32, 32, 32] # Input: (x,y) coordinates
Training Data:
10 training parameters uniformly spaced in \([1, 10]\)
\(32 \times 32 = 1024\) spatial points per parameter
Test on \(v = 5.5\)
Run:
python app/operator/deeponet/solve_2d_poisson.py
Expected Output:
L2 relative error: typically < 1e-2
Demonstrates operator learning in 2D spatial domains
2.3.3.4. 4. Two-Dimensional Poisson with CNN Branch
File: app/operator/deeponet/solve_2d_poisson_cnn.py
Purpose: Use CNN-based branch network for processing 2D field inputs (images).
Key Innovation: Instead of a scalar parameter, the branch network processes entire 2D fields:
as a \(16 \times 16\) grid (image).
Architecture:
# CNN Branch Network
conv_layers = [1, 8, 16, 32] # Channels: 1 → 8 → 16 → 32
fc_layers = [32, 32] # Flattened features
# Input: (batch, 1, 16, 16)
# Output: (batch, 32)
# FNN Trunk Network
trunk_layers = [2, 32, 32, 32] # Input: (x,y) coordinates
Features:
Automatic CNN branch detection via
network.branch_is_cnnBatch normalization and max pooling
Kaiming initialization for ReLU activation
Supports higher-resolution field inputs
Training Data:
20 RHS fields on \(16 \times 16\) grid
Evaluation on finer \(32 \times 32\) grid
Demonstrates resolution independence
Benefits of CNN Branch:
Handles complex spatial input patterns
Translation equivariance for physical fields
Efficient for high-dimensional input functions
Run:
python app/operator/deeponet/solve_2d_poisson_cnn.py
Expected Output:
Confirmation of CNN branch detection
Lower error for complex input patterns
Runtime: ~1-2 minutes
2.3.3.5. 5. Quick Test Driver
File: app/operator/deeponet/solve_1d_poisson_test.py
Purpose: Minimal example for rapid testing and debugging.
Use Cases:
Quick verification of installation
Testing code modifications
Minimal computational requirements
2.3.4. DeepONet Usage Guidelines
When to use DeepONet:
Learning solution operators for parametric PDEs
Surrogate modeling with varying input functions
Multi-query scenarios (many evaluations with different inputs)
Branch Network Selection:
FNN: Scalar/vector parameters, 1D functions
CNN: 2D/3D fields, images, spatial distributions
Training Tips:
Start with small networks and fewer epochs for prototyping
Use batch training for datasets with >100 samples
Monitor L2 error on held-out test data
Increase network depth/width if underfitting
Add regularization if overfitting
Performance Considerations:
Training time scales with: number of trunk points × batch size
Inference is fast: single forward pass per query
GPU acceleration recommended for CNN branches
2.4. DeepCSNet (Deep Coefficient-Subnet Network)
2.4.1. Theory
DeepCSNet [2] is a specialized operator network for electron-impact cross section prediction in plasma physics. It employs a modular “coefficient-subnet” architecture that processes different input feature types separately.
2.4.2. Architecture
DeepCSNet consists of up to three optional sub-networks:
Molecular Features --> [Molecule Net] --> m = [m₁, ..., mₚ]
|
Energy Features --> [Energy Net] --> e = [e₁, ..., eₚ]
|
Angles/Coordinates --> [Trunk Net] --> t = [t₁, ..., tₚ]
↓
σ = Combine(m, e, t) + bias
Operation Modes:
SMC (Single-Molecule Configuration):
Energy Net + Trunk Net
For single molecular species
MMC (Multi-Molecule Configuration):
Molecule Net + Trunk Net (+ optional Energy Net)
For multiple molecular species
2.4.3. Example: Total Ionization Cross Section Prediction
File: app/operator/deepcsnet/predict_total_ionxsec.py
Physical Problem: Predict total electron-impact ionization cross sections \(Q(\text{molecule}, E)\) as a function of molecular composition and incident electron energy.
Application: Crucial for plasma modeling, mass spectrometry, and radiation chemistry simulations.
Data:
88 organic molecules (C, H, O, N, F compounds)
Cross section measurements at various energies
Energy range: \(E \geq 30\) eV (filtered for reliability)
Molecular Descriptors (5 features):
Number of Carbon atoms (C)
Number of Hydrogen atoms (H)
Number of Oxygen atoms (O)
Number of Nitrogen atoms (N)
Number of Fluorine atoms (F)
Network Configuration:
molecule_layers = [5, 80, 80, 80] # Molecule Net: C,H,O,N,F → features
trunk_layers = [1, 80, 80, 80] # Trunk Net: Energy → features
Data Processing Pipeline:
Load CSV files (one per molecule)
Parse molecular formulas → extract atom counts
Filter energy range (\(E \geq 30\) eV)
Logarithmic transformation: \(\log_{10}(Q)\)
Normalize to \([0.05, 0.95]\) to prevent saturation
Split: 70 molecules (training) + 18 molecules (testing)
Training Configuration:
Optimizer: Adam with learning rate \(5 \times 10^{-4}\)
Learning rate schedule: constant for 100k epochs, then \(\times 0.5\)
Total epochs: 200,000
Loss function: Mean Squared Error (MSE)
Features:
TensorBoard logging for real-time monitoring
Checkpoint saving every 50k epochs
Resume training capability
Comprehensive error analysis
Run:
python app/operator/deepcsnet/predict_total_ionxsec.py
Expected Output:
Training progress logged to TensorBoard
Checkpoints saved in
app/operator/deepcsnet/models/Results saved in
app/operator/deepcsnet/results/Final relative L2 error on test set
Runtime: ~30-60 minutes for 200k epochs (GPU)
Physical Insights:
Learns complex electron-molecule scattering physics
Captures energy-dependent ionization thresholds
Generalizes to unseen molecular compositions
Typical test accuracy: relative error < 10%
2.5. Best Practices
2.5.1. 1. Data Preparation
Normalization: Scale inputs/outputs to \([0, 1]\) or \([-1, 1]\)
Logarithmic Transform: Use for quantities spanning orders of magnitude
Train/Test Split: Hold out diverse test cases (e.g., different molecules, parameter ranges)
2.5.2. 2. Network Design
Start Simple: Begin with shallow networks (3-4 layers)
Scale Up Gradually: Increase depth/width if needed
Match Dimensions: Ensure branch and trunk output same dimension \(p\)
2.5.3. 3. Training Strategy
Learning Rate: Start with \(10^{-4}\) to \(10^{-3}\)
Learning Rate Decay: Apply exponential or step decay
Early Stopping: Monitor validation loss
Checkpointing: Save models periodically for long training runs
2.5.4. 4. Validation
Visual Inspection: Plot predictions vs. ground truth
Error Metrics: Compute L2, L∞, pointwise errors
Extrapolation Tests: Test on parameter values outside training range
2.5.5. 5. Debugging
Overfitting: Add dropout, reduce network size, or increase data
Underfitting: Increase network capacity or training epochs
Numerical Issues: Check for NaN/Inf, adjust learning rate or normalization
2.6. Performance Benchmarks
Typical performance on standard hardware:
Example |
Training Time |
GPU Memory |
Test L2 Error |
|---|---|---|---|
1D Poisson (FNN) |
~15 sec |
<500 MB |
<10⁻³ |
2D Poisson (FNN) |
~30 sec |
<1 GB |
<10⁻² |
2D Poisson (CNN) |
~2 min |
~2 GB |
<10⁻² |
Total Ionization XS |
~45 min |
~3 GB |
~10% |
Hardware: Single NVIDIA GPU (e.g., RTX 3090, A100)
2.7. Troubleshooting
2.7.1. Training Not Converging
Reduce learning rate by 10×
Check data normalization
Verify network architecture matches data dimensions
2.7.2. GPU Out of Memory
Reduce batch size
Use smaller networks
Process data in smaller chunks
2.7.3. High Test Error
Increase training data diversity
Try deeper/wider networks
Check for data leakage or poor train/test split
2.8. References
[1] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, “Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators,” Nature Machine Intelligence, vol. 3, no. 3, pp. 218-229, 2021.
[2] Y. Wang and L. Zhong, “DeepCSNet: a deep learning method for predicting electron-impact doubly differential ionization cross sections,” Plasma Sources Science and Technology, vol. 33, no. 10, p. 105012, 2024.
2.9. Further Reading
Operator Learning: Lu et al., “DeepXDE: A deep learning library for solving differential equations” (2021)
Physics-Informed ML: Karniadakis et al., “Physics-informed machine learning” (2021)
Plasma Physics Applications: Wang et al., “Machine learning methods for plasma physics” (2024)