# Operator Learning

This document provides comprehensive examples demonstrating the use of neural operators for learning mappings between infinite-dimensional function spaces. Neural operators are powerful tools for solving parametric PDEs, surrogate modeling, and physics-informed machine learning tasks.

## Overview

**Operator Learning** aims to learn mappings between function spaces rather than finite-dimensional vectors. Given input functions (or fields) and their corresponding output functions, neural operators learn the underlying operator $\mathcal{G}: \mathcal{U} \to \mathcal{V}$ that maps from input function space $\mathcal{U}$ to output function space $\mathcal{V}$.

### Why Operator Learning?

Traditional neural networks learn point-wise mappings. Operator learning provides several advantages:

- **Discretization Invariance**: Train on one mesh, evaluate on another
- **Generalization**: Learn families of solutions parameterized by input functions
- **Efficiency**: Solve parametric PDEs without multiple FEM/FDM simulations
- **Physical Insight**: Capture underlying operator structure

### Available Implementations

AI4Plasma provides two operator learning architectures:

- **DeepONet**: Universal approximator for nonlinear operators
- **DeepCSNet**: Specialized network for electron-impact cross section prediction

## Common Setup

All scripts can be executed from the repository root directory. Most examples share common features:

- **Device Selection**: Automatically choose CPU/GPU via `ai4plasma.config.DEVICE`
- **Reproducibility**: Fix random seeds via `ai4plasma.utils.common.set_seed`
- **Performance Metrics**: Report relative L2 error for validation
- **Timing**: Monitor training and inference time using `ai4plasma.utils.common.Timer`

### Hardware Configuration

```python
from ai4plasma.utils.device import check_gpu
from ai4plasma.config import DEVICE

if check_gpu(print_required=True):
    DEVICE.set_device(0)  # Use first GPU
else:
    DEVICE.set_device(-1)  # Fall back to CPU
```

---

## DeepONet (Deep Operator Network)

### Theory

DeepONet [1] learns operators $\mathcal{G}: u(\cdot) \to \mathcal{G}(u)(\cdot)$ by leveraging the universal approximation theorem for operators. The key insight is to represent the output function as:

$$
\mathcal{G}(u)(y) \approx \sum_{k=1}^{p} b_k(u) \cdot t_k(y) + b_0
$$

where:
- $u$ is the input function (discretized as sensors)
- $y$ is the evaluation location
- $b_k(u)$ are basis functions from the **branch network** (depend on input function)
- $t_k(y)$ are basis functions from the **trunk network** (depend on location)
- $b_0$ is a learnable bias term

### Architecture

```
Input Function u --> [Branch Net] --> b = [b₁, b₂, ..., bₚ]
                                              |
                                              | Inner Product
                                              |
Location y      --> [Trunk Net]  --> t = [t₁, t₂, ..., tₚ]
                                              ↓
                                        G(u)(y) = b·t + b₀
```

**Branch Network** options:
- **FNN**: For 1D functions or feature vectors
- **CNN**: For 2D/3D field inputs (images, spatial distributions)

**Trunk Network**:
- Typically FNN processing spatial/temporal coordinates

### Examples

#### 1. One-Dimensional Poisson Equation

**File**: `app/operator/deeponet/solve_1d_poisson.py`

**Problem**: Learn the solution operator for the 1D Poisson equation family:

$$
-\frac{d^2 u}{dx^2} = f(x) = v \pi^2 \sin(\pi x), \quad x \in [-1, 1]
$$

with analytical solution $u(x) = v \sin(\pi x)$.

**Features**:
- Branch input: scalar parameter $v$ (amplitude)
- Trunk input: spatial coordinate $x$
- Simple FNN networks for both branch and trunk
- Ideal for quick testing and verification

**Network Configuration**:
```python
branch_layers = [1, 10, 10, 10, 10]  # Input: v (1D)
trunk_layers = [1, 10, 10, 10, 10]   # Input: x (1D)
```

**Training Data**:
- 5 training parameters: $v \in \{2, 4, 6, 8, 10\}$
- 40 spatial evaluation points
- Test on unseen $v = 5.5$

**Run**:
```bash
python app/operator/deeponet/solve_1d_poisson.py
```

**Expected Output**:
- Training progress with loss values
- Final L2 relative error: typically < 1e-3
- Total runtime: ~10-30 seconds (depending on hardware)

---

#### 2. One-Dimensional Poisson with Batch Training

**File**: `app/operator/deeponet/solve_1d_poisson_batch.py`

**Purpose**: Demonstrate batch-wise training with DataLoader for larger datasets.

**Key Differences**:
- Uses `batch_size=5` with PyTorch DataLoader
- Training data: 20 parameters uniformly spaced in $[1, 20]$
- Demonstrates scalability to larger datasets

**Features**:
- Efficient memory management for large datasets
- Shuffling and batching capabilities
- Same PDE as `solve_1d_poisson.py` but with more training data

**Run**:
```bash
python app/operator/deeponet/solve_1d_poisson_batch.py
```

---

#### 3. Two-Dimensional Poisson Equation

**File**: `app/operator/deeponet/solve_2d_poisson.py`

**Problem**: Learn the solution operator for 2D Poisson equation:

$$
-\Delta u(x,y) = f(x,y) = 2v\pi^2 \sin(\pi x)\sin(\pi y), \quad (x,y) \in [0,1]^2
$$

with analytical solution $u(x,y) = v \sin(\pi x)\sin(\pi y)$.

**Features**:
- Branch input: scalar parameter $v$
- Trunk input: 2D coordinates $(x, y)$
- Tests generalization to higher-dimensional spaces
- Evaluation on Cartesian grid ($32 \times 32$ points)

**Network Configuration**:
```python
branch_layers = [1, 32, 32, 32]   # Input: v (1D parameter)
trunk_layers = [2, 32, 32, 32]    # Input: (x,y) coordinates
```

**Training Data**:
- 10 training parameters uniformly spaced in $[1, 10]$
- $32 \times 32 = 1024$ spatial points per parameter
- Test on $v = 5.5$

**Run**:
```bash
python app/operator/deeponet/solve_2d_poisson.py
```

**Expected Output**:
- L2 relative error: typically < 1e-2
- Demonstrates operator learning in 2D spatial domains

---

#### 4. Two-Dimensional Poisson with CNN Branch

**File**: `app/operator/deeponet/solve_2d_poisson_cnn.py`

**Purpose**: Use CNN-based branch network for processing 2D field inputs (images).

**Key Innovation**: Instead of a scalar parameter, the branch network processes entire 2D fields:

$$
f(x,y) = 2v\pi^2 \sin(\pi x)\sin(\pi y)
$$

as a $16 \times 16$ grid (image).

**Architecture**:
```python
# CNN Branch Network
conv_layers = [1, 8, 16, 32]      # Channels: 1 → 8 → 16 → 32
fc_layers = [32, 32]               # Flattened features
# Input: (batch, 1, 16, 16)
# Output: (batch, 32)

# FNN Trunk Network
trunk_layers = [2, 32, 32, 32]    # Input: (x,y) coordinates
```

**Features**:
- Automatic CNN branch detection via `network.branch_is_cnn`
- Batch normalization and max pooling
- Kaiming initialization for ReLU activation
- Supports higher-resolution field inputs

**Training Data**:
- 20 RHS fields on $16 \times 16$ grid
- Evaluation on finer $32 \times 32$ grid
- Demonstrates resolution independence

**Benefits of CNN Branch**:
- Handles complex spatial input patterns
- Translation equivariance for physical fields
- Efficient for high-dimensional input functions

**Run**:
```bash
python app/operator/deeponet/solve_2d_poisson_cnn.py
```

**Expected Output**:
- Confirmation of CNN branch detection
- Lower error for complex input patterns
- Runtime: ~1-2 minutes

---

#### 5. Quick Test Driver

**File**: `app/operator/deeponet/solve_1d_poisson_test.py`

**Purpose**: Minimal example for rapid testing and debugging.

**Use Cases**:
- Quick verification of installation
- Testing code modifications
- Minimal computational requirements

---

### DeepONet Usage Guidelines

**When to use DeepONet**:
- Learning solution operators for parametric PDEs
- Surrogate modeling with varying input functions
- Multi-query scenarios (many evaluations with different inputs)

**Branch Network Selection**:
- **FNN**: Scalar/vector parameters, 1D functions
- **CNN**: 2D/3D fields, images, spatial distributions

**Training Tips**:
1. Start with small networks and fewer epochs for prototyping
2. Use batch training for datasets with >100 samples
3. Monitor L2 error on held-out test data
4. Increase network depth/width if underfitting
5. Add regularization if overfitting

**Performance Considerations**:
- Training time scales with: number of trunk points × batch size
- Inference is fast: single forward pass per query
- GPU acceleration recommended for CNN branches

---

## DeepCSNet (Deep Coefficient-Subnet Network)

### Theory

DeepCSNet [2] is a specialized operator network for electron-impact cross section prediction in plasma physics. It employs a modular "coefficient-subnet" architecture that processes different input feature types separately.

### Architecture

DeepCSNet consists of up to three optional sub-networks:

```
Molecular Features  --> [Molecule Net] --> m = [m₁, ..., mₚ]
                                               |
Energy Features     --> [Energy Net]   --> e = [e₁, ..., eₚ]
                                               |
Angles/Coordinates  --> [Trunk Net]    --> t = [t₁, ..., tₚ]
                                               ↓
                                    σ = Combine(m, e, t) + bias
```

**Operation Modes**:

1. **SMC (Single-Molecule Configuration)**:
   - Energy Net + Trunk Net
   - For single molecular species

2. **MMC (Multi-Molecule Configuration)**:
   - Molecule Net + Trunk Net (+ optional Energy Net)
   - For multiple molecular species

### Example: Total Ionization Cross Section Prediction

**File**: `app/operator/deepcsnet/predict_total_ionxsec.py`

**Physical Problem**: Predict total electron-impact ionization cross sections $Q(\text{molecule}, E)$ as a function of molecular composition and incident electron energy.

**Application**: Crucial for plasma modeling, mass spectrometry, and radiation chemistry simulations.

**Data**:
- 88 organic molecules (C, H, O, N, F compounds)
- Cross section measurements at various energies
- Energy range: $E \geq 30$ eV (filtered for reliability)

**Molecular Descriptors** (5 features):
- Number of Carbon atoms (C)
- Number of Hydrogen atoms (H)
- Number of Oxygen atoms (O)
- Number of Nitrogen atoms (N)
- Number of Fluorine atoms (F)

**Network Configuration**:
```python
molecule_layers = [5, 80, 80, 80]   # Molecule Net: C,H,O,N,F → features
trunk_layers = [1, 80, 80, 80]      # Trunk Net: Energy → features
```

**Data Processing Pipeline**:
1. Load CSV files (one per molecule)
2. Parse molecular formulas → extract atom counts
3. Filter energy range ($E \geq 30$ eV)
4. Logarithmic transformation: $\log_{10}(Q)$
5. Normalize to $[0.05, 0.95]$ to prevent saturation
6. Split: 70 molecules (training) + 18 molecules (testing)

**Training Configuration**:
- Optimizer: Adam with learning rate $5 \times 10^{-4}$
- Learning rate schedule: constant for 100k epochs, then $\times 0.5$
- Total epochs: 200,000
- Loss function: Mean Squared Error (MSE)

**Features**:
- TensorBoard logging for real-time monitoring
- Checkpoint saving every 50k epochs
- Resume training capability
- Comprehensive error analysis

**Run**:
```bash
python app/operator/deepcsnet/predict_total_ionxsec.py
```

**Expected Output**:
- Training progress logged to TensorBoard
- Checkpoints saved in `app/operator/deepcsnet/models/`
- Results saved in `app/operator/deepcsnet/results/`
- Final relative L2 error on test set
- Runtime: ~30-60 minutes for 200k epochs (GPU)

**Physical Insights**:
- Learns complex electron-molecule scattering physics
- Captures energy-dependent ionization thresholds
- Generalizes to unseen molecular compositions
- Typical test accuracy: relative error < 10%

---

## Best Practices

### 1. Data Preparation
- **Normalization**: Scale inputs/outputs to $[0, 1]$ or $[-1, 1]$
- **Logarithmic Transform**: Use for quantities spanning orders of magnitude
- **Train/Test Split**: Hold out diverse test cases (e.g., different molecules, parameter ranges)

### 2. Network Design
- **Start Simple**: Begin with shallow networks (3-4 layers)
- **Scale Up Gradually**: Increase depth/width if needed
- **Match Dimensions**: Ensure branch and trunk output same dimension $p$

### 3. Training Strategy
- **Learning Rate**: Start with $10^{-4}$ to $10^{-3}$
- **Learning Rate Decay**: Apply exponential or step decay
- **Early Stopping**: Monitor validation loss
- **Checkpointing**: Save models periodically for long training runs

### 4. Validation
- **Visual Inspection**: Plot predictions vs. ground truth
- **Error Metrics**: Compute L2, L∞, pointwise errors
- **Extrapolation Tests**: Test on parameter values outside training range

### 5. Debugging
- **Overfitting**: Add dropout, reduce network size, or increase data
- **Underfitting**: Increase network capacity or training epochs
- **Numerical Issues**: Check for NaN/Inf, adjust learning rate or normalization

---

## Performance Benchmarks

Typical performance on standard hardware:

| Example | Training Time | GPU Memory | Test L2 Error |
|---------|--------------|------------|---------------|
| 1D Poisson (FNN) | ~15 sec | <500 MB | <10⁻³ |
| 2D Poisson (FNN) | ~30 sec | <1 GB | <10⁻² |
| 2D Poisson (CNN) | ~2 min | ~2 GB | <10⁻² |
| Total Ionization XS | ~45 min | ~3 GB | ~10% |

*Hardware: Single NVIDIA GPU (e.g., RTX 3090, A100)*

---

## Troubleshooting

### Training Not Converging
- Reduce learning rate by 10×
- Check data normalization
- Verify network architecture matches data dimensions

### GPU Out of Memory
- Reduce batch size
- Use smaller networks
- Process data in smaller chunks

### High Test Error
- Increase training data diversity
- Try deeper/wider networks
- Check for data leakage or poor train/test split

---

## References

[1] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, "Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators," *Nature Machine Intelligence*, vol. 3, no. 3, pp. 218-229, 2021.

[2] Y. Wang and L. Zhong, "DeepCSNet: a deep learning method for predicting electron-impact doubly differential ionization cross sections," *Plasma Sources Science and Technology*, vol. 33, no. 10, p. 105012, 2024.

---

## Further Reading

- **Operator Learning**: Lu et al., "DeepXDE: A deep learning library for solving differential equations" (2021)
- **Physics-Informed ML**: Karniadakis et al., "Physics-informed machine learning" (2021)
- **Plasma Physics Applications**: Wang et al., "Machine learning methods for plasma physics" (2024)