2. Operator Learning

This document provides comprehensive examples demonstrating the use of neural operators for learning mappings between infinite-dimensional function spaces. Neural operators are powerful tools for solving parametric PDEs, surrogate modeling, and physics-informed machine learning tasks.

2.1. Overview

Operator Learning aims to learn mappings between function spaces rather than finite-dimensional vectors. Given input functions (or fields) and their corresponding output functions, neural operators learn the underlying operator \(\mathcal{G}: \mathcal{U} \to \mathcal{V}\) that maps from input function space \(\mathcal{U}\) to output function space \(\mathcal{V}\).

2.1.1. Why Operator Learning?

Traditional neural networks learn point-wise mappings. Operator learning provides several advantages:

Discretization Invariance: Train on one mesh, evaluate on another
Generalization: Learn families of solutions parameterized by input functions
Efficiency: Solve parametric PDEs without multiple FEM/FDM simulations
Physical Insight: Capture underlying operator structure

2.1.2. Available Implementations

AI4Plasma provides two operator learning architectures:

DeepONet: Universal approximator for nonlinear operators
DeepCSNet: Specialized network for electron-impact cross section prediction

2.2. Common Setup

All scripts can be executed from the repository root directory. Most examples share common features:

Device Selection: Automatically choose CPU/GPU via ai4plasma.config.DEVICE
Reproducibility: Fix random seeds via ai4plasma.utils.common.set_seed
Performance Metrics: Report relative L2 error for validation
Timing: Monitor training and inference time using ai4plasma.utils.common.Timer

2.2.1. Hardware Configuration

from ai4plasma.utils.device import check_gpu
from ai4plasma.config import DEVICE

if check_gpu(print_required=True):
    DEVICE.set_device(0)  # Use first GPU
else:
    DEVICE.set_device(-1)  # Fall back to CPU

2.3. DeepONet (Deep Operator Network)

2.3.1. Theory

DeepONet [1] learns operators \(\mathcal{G}: u(\cdot) \to \mathcal{G}(u)(\cdot)\) by leveraging the universal approximation theorem for operators. The key insight is to represent the output function as:

\[ \mathcal{G}(u)(y) \approx \sum_{k=1}^{p} b_k(u) \cdot t_k(y) + b_0 \]

where:

\(u\) is the input function (discretized as sensors)
\(y\) is the evaluation location
\(b_k(u)\) are basis functions from the branch network (depend on input function)
\(t_k(y)\) are basis functions from the trunk network (depend on location)
\(b_0\) is a learnable bias term

2.3.2. Architecture

Input Function u --> [Branch Net] --> b = [b₁, b₂, ..., bₚ]
                                              |
                                              | Inner Product
                                              |
Location y      --> [Trunk Net]  --> t = [t₁, t₂, ..., tₚ]
                                              ↓
                                        G(u)(y) = b·t + b₀

Branch Network options:

FNN: For 1D functions or feature vectors
CNN: For 2D/3D field inputs (images, spatial distributions)

Trunk Network:

Typically FNN processing spatial/temporal coordinates

2.3.3. Examples

2.3.3.1. 1. One-Dimensional Poisson Equation

File: app/operator/deeponet/solve_1d_poisson.py

Problem: Learn the solution operator for the 1D Poisson equation family:

\[ -\frac{d^2 u}{dx^2} = f(x) = v \pi^2 \sin(\pi x), \quad x \in [-1, 1] \]

with analytical solution \(u(x) = v \sin(\pi x)\).

Features:

Branch input: scalar parameter \(v\) (amplitude)
Trunk input: spatial coordinate \(x\)
Simple FNN networks for both branch and trunk
Ideal for quick testing and verification

Network Configuration:

branch_layers = [1, 10, 10, 10, 10]  # Input: v (1D)
trunk_layers = [1, 10, 10, 10, 10]   # Input: x (1D)

Training Data:

5 training parameters: \(v \in \{2, 4, 6, 8, 10\}\)
40 spatial evaluation points
Test on unseen \(v = 5.5\)

Run:

python app/operator/deeponet/solve_1d_poisson.py

Expected Output:

Training progress with loss values
Final L2 relative error: typically < 1e-3
Total runtime: ~10-30 seconds (depending on hardware)

2.3.3.2. 2. One-Dimensional Poisson with Batch Training

File: app/operator/deeponet/solve_1d_poisson_batch.py

Purpose: Demonstrate batch-wise training with DataLoader for larger datasets.

Key Differences:

Uses batch_size=5 with PyTorch DataLoader
Training data: 20 parameters uniformly spaced in \([1, 20]\)
Demonstrates scalability to larger datasets

Features:

Efficient memory management for large datasets
Shuffling and batching capabilities
Same PDE as solve_1d_poisson.py but with more training data

Run:

python app/operator/deeponet/solve_1d_poisson_batch.py

2.3.3.3. 3. Two-Dimensional Poisson Equation

File: app/operator/deeponet/solve_2d_poisson.py

Problem: Learn the solution operator for 2D Poisson equation:

\[ -\Delta u(x,y) = f(x,y) = 2v\pi^2 \sin(\pi x)\sin(\pi y), \quad (x,y) \in [0,1]^2 \]

with analytical solution \(u(x,y) = v \sin(\pi x)\sin(\pi y)\).

Features:

Branch input: scalar parameter \(v\)
Trunk input: 2D coordinates \((x, y)\)
Tests generalization to higher-dimensional spaces
Evaluation on Cartesian grid (\(32 \times 32\) points)

Network Configuration:

branch_layers = [1, 32, 32, 32]   # Input: v (1D parameter)
trunk_layers = [2, 32, 32, 32]    # Input: (x,y) coordinates

Training Data:

10 training parameters uniformly spaced in \([1, 10]\)
\(32 \times 32 = 1024\) spatial points per parameter
Test on \(v = 5.5\)

Run:

python app/operator/deeponet/solve_2d_poisson.py

Expected Output:

L2 relative error: typically < 1e-2
Demonstrates operator learning in 2D spatial domains

2.3.3.4. 4. Two-Dimensional Poisson with CNN Branch

File: app/operator/deeponet/solve_2d_poisson_cnn.py

Purpose: Use CNN-based branch network for processing 2D field inputs (images).

Key Innovation: Instead of a scalar parameter, the branch network processes entire 2D fields:

\[ f(x,y) = 2v\pi^2 \sin(\pi x)\sin(\pi y) \]

as a \(16 \times 16\) grid (image).

Architecture:

# CNN Branch Network
conv_layers = [1, 8, 16, 32]      # Channels: 1 → 8 → 16 → 32
fc_layers = [32, 32]               # Flattened features
# Input: (batch, 1, 16, 16)
# Output: (batch, 32)

# FNN Trunk Network
trunk_layers = [2, 32, 32, 32]    # Input: (x,y) coordinates

Features:

Automatic CNN branch detection via network.branch_is_cnn
Batch normalization and max pooling
Kaiming initialization for ReLU activation
Supports higher-resolution field inputs

Training Data:

20 RHS fields on \(16 \times 16\) grid
Evaluation on finer \(32 \times 32\) grid
Demonstrates resolution independence

Benefits of CNN Branch:

Handles complex spatial input patterns
Translation equivariance for physical fields
Efficient for high-dimensional input functions

Run:

python app/operator/deeponet/solve_2d_poisson_cnn.py

Expected Output:

Confirmation of CNN branch detection
Lower error for complex input patterns
Runtime: ~1-2 minutes

2.3.3.5. 5. Quick Test Driver

File: app/operator/deeponet/solve_1d_poisson_test.py

Purpose: Minimal example for rapid testing and debugging.

Use Cases:

Quick verification of installation
Testing code modifications
Minimal computational requirements

2.3.4. DeepONet Usage Guidelines

When to use DeepONet:

Learning solution operators for parametric PDEs
Surrogate modeling with varying input functions
Multi-query scenarios (many evaluations with different inputs)

Branch Network Selection:

FNN: Scalar/vector parameters, 1D functions
CNN: 2D/3D fields, images, spatial distributions

Training Tips:

Start with small networks and fewer epochs for prototyping
Use batch training for datasets with >100 samples
Monitor L2 error on held-out test data
Increase network depth/width if underfitting
Add regularization if overfitting

Performance Considerations:

Training time scales with: number of trunk points × batch size
Inference is fast: single forward pass per query
GPU acceleration recommended for CNN branches

2.4. DeepCSNet (Deep Coefficient-Subnet Network)

2.4.1. Theory

DeepCSNet [2] is a specialized operator network for electron-impact cross section prediction in plasma physics. It employs a modular “coefficient-subnet” architecture that processes different input feature types separately.

2.4.2. Architecture

DeepCSNet consists of up to three optional sub-networks:

Molecular Features  --> [Molecule Net] --> m = [m₁, ..., mₚ]
                                               |
Energy Features     --> [Energy Net]   --> e = [e₁, ..., eₚ]
                                               |
Angles/Coordinates  --> [Trunk Net]    --> t = [t₁, ..., tₚ]
                                               ↓
                                    σ = Combine(m, e, t) + bias

Operation Modes:

SMC (Single-Molecule Configuration):
- Energy Net + Trunk Net
- For single molecular species
MMC (Multi-Molecule Configuration):
- Molecule Net + Trunk Net (+ optional Energy Net)
- For multiple molecular species

2.4.3. Example: Total Ionization Cross Section Prediction

File: app/operator/deepcsnet/predict_total_ionxsec.py

Physical Problem: Predict total electron-impact ionization cross sections \(Q(\text{molecule}, E)\) as a function of molecular composition and incident electron energy.

Application: Crucial for plasma modeling, mass spectrometry, and radiation chemistry simulations.

Data:

88 organic molecules (C, H, O, N, F compounds)
Cross section measurements at various energies
Energy range: \(E \geq 30\) eV (filtered for reliability)

Molecular Descriptors (5 features):

Number of Carbon atoms (C)
Number of Hydrogen atoms (H)
Number of Oxygen atoms (O)
Number of Nitrogen atoms (N)
Number of Fluorine atoms (F)

Network Configuration:

molecule_layers = [5, 80, 80, 80]   # Molecule Net: C,H,O,N,F → features
trunk_layers = [1, 80, 80, 80]      # Trunk Net: Energy → features

Data Processing Pipeline:

Load CSV files (one per molecule)
Parse molecular formulas → extract atom counts
Filter energy range (\(E \geq 30\) eV)
Logarithmic transformation: \(\log_{10}(Q)\)
Normalize to \([0.05, 0.95]\) to prevent saturation
Split: 70 molecules (training) + 18 molecules (testing)

Training Configuration:

Optimizer: Adam with learning rate \(5 \times 10^{-4}\)
Learning rate schedule: constant for 100k epochs, then \(\times 0.5\)
Total epochs: 200,000
Loss function: Mean Squared Error (MSE)

Features:

TensorBoard logging for real-time monitoring
Checkpoint saving every 50k epochs
Resume training capability
Comprehensive error analysis

Run:

python app/operator/deepcsnet/predict_total_ionxsec.py

Expected Output:

Training progress logged to TensorBoard
Checkpoints saved in app/operator/deepcsnet/models/
Results saved in app/operator/deepcsnet/results/
Final relative L2 error on test set
Runtime: ~30-60 minutes for 200k epochs (GPU)

Physical Insights:

Learns complex electron-molecule scattering physics
Captures energy-dependent ionization thresholds
Generalizes to unseen molecular compositions
Typical test accuracy: relative error < 10%

2.5. Best Practices

2.5.1. 1. Data Preparation

Normalization: Scale inputs/outputs to \([0, 1]\) or \([-1, 1]\)
Logarithmic Transform: Use for quantities spanning orders of magnitude
Train/Test Split: Hold out diverse test cases (e.g., different molecules, parameter ranges)

2.5.2. 2. Network Design

Start Simple: Begin with shallow networks (3-4 layers)
Scale Up Gradually: Increase depth/width if needed
Match Dimensions: Ensure branch and trunk output same dimension \(p\)

2.5.3. 3. Training Strategy

Learning Rate: Start with \(10^{-4}\) to \(10^{-3}\)
Learning Rate Decay: Apply exponential or step decay
Early Stopping: Monitor validation loss
Checkpointing: Save models periodically for long training runs

2.5.4. 4. Validation

Visual Inspection: Plot predictions vs. ground truth
Error Metrics: Compute L2, L∞, pointwise errors
Extrapolation Tests: Test on parameter values outside training range

2.5.5. 5. Debugging

Overfitting: Add dropout, reduce network size, or increase data
Underfitting: Increase network capacity or training epochs
Numerical Issues: Check for NaN/Inf, adjust learning rate or normalization

2.6. Performance Benchmarks

Typical performance on standard hardware:

Example	Training Time	GPU Memory	Test L2 Error
1D Poisson (FNN)	~15 sec	<500 MB	<10⁻³
2D Poisson (FNN)	~30 sec	<1 GB	<10⁻²
2D Poisson (CNN)	~2 min	~2 GB	<10⁻²
Total Ionization XS	~45 min	~3 GB	~10%

Hardware: Single NVIDIA GPU (e.g., RTX 3090, A100)

2.7. Troubleshooting

2.7.1. Training Not Converging

Reduce learning rate by 10×
Check data normalization
Verify network architecture matches data dimensions

2.7.2. GPU Out of Memory

Reduce batch size
Use smaller networks
Process data in smaller chunks

2.7.3. High Test Error

Increase training data diversity
Try deeper/wider networks
Check for data leakage or poor train/test split

2.8. References

[1] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, “Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators,” Nature Machine Intelligence, vol. 3, no. 3, pp. 218-229, 2021.

[2] Y. Wang and L. Zhong, “DeepCSNet: a deep learning method for predicting electron-impact doubly differential ionization cross sections,” Plasma Sources Science and Technology, vol. 33, no. 10, p. 105012, 2024.

2.9. Further Reading

Operator Learning: Lu et al., “DeepXDE: A deep learning library for solving differential equations” (2021)
Physics-Informed ML: Karniadakis et al., “Physics-informed machine learning” (2021)
Plasma Physics Applications: Wang et al., “Machine learning methods for plasma physics” (2024)