Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Production Run Cycle

This page explains how to obtain precise simulation data with ALF. The focus is on choosing the simulation control parameters — the knobs that determine whether your data is trustworthy and your compute time well spent. The physics parameters (which model, which coupling, which lattice size) depend on the specific study and are covered elsewhere; here we concentrate on the machinery.

The Iterative Cycle

Production data is rarely obtained in a single shot. The typical workflow is:

  1. Set up and run — Choose initial values for the control parameters, launch the simulation

  2. Analyze — Run the analysis tools, inspect error bars and the QMC time series

  3. Evaluate — Is the data precise enough? Are there signs of thermalization issues or autocorrelation?

  4. Extend or adjust — Add more bins, tune parameters, rerun

This cycle repeats until the data quality is sufficient for the question at hand. The rest of this page explains how to choose the parameters that govern this cycle.

Simulation Control Parameters

Dtau — Imaginary-Time Step

Dtau controls the Trotter discretization. The number of time slices is Ltrot=nint(β/Δτ)L_\text{trot} = \text{nint}(\beta / \Delta\tau).

Nwrap — Stabilization Interval

The Green function is propagated (“wrapped”) from one time slice to the next via rank-1 updates. Every Nwrap slices, it is recomputed from scratch using a numerically stable UDV decomposition. See [[Stabilization Parameters]].

Choose Nwrap as large as possible while maintaining precise Green functions:

NSweep — Sweeps per Bin

Each bin consists of NSweep full sweeps through all time slices. Observables are averaged within a bin and written to disk at the end.

The right value of NSweep depends on how long a single sweep takes:

There is no universal “best” value — it depends on the cost per sweep and the job scheduling constraints.

NBin — Number of Bins

NBin sets how many bins the simulation produces in one invocation. The total number of effective bins used in the analysis is (total_bins - n_skip) / N_rebin, where total_bins may come from multiple job submissions.

In practice:

CPU_MAX — Wall-Clock Safety

CPU_MAX (given in hours in the parameters file, e.g., CPU_MAX = 24.0 for a one-day limit) ensures graceful termination. After each bin, ALF estimates whether there is enough time for another bin. If not, it stops cleanly, writes the field configuration, and exits.

This is primarily a safety measure to prevent data corruption: if the queuing system kills the job during I/O, data files can be left in an inconsistent state. Always set CPU_MAX to your job’s wall-clock limit or slightly below.

Note: CPU_MAX overrides NBin — if wall-clock time runs out, the simulation stops even if fewer than NBin bins have been completed.

&VAR_QMC
NSweep  = 40
NBin    = 100
Nwrap   = 10
CPU_MAX = 24.0   ! hours
/

Ltau — Time-Displaced Measurements

Set Ltau = 1 to enable time-displaced Green function measurements (G(τ)G(\tau), spin correlations in τ\tau, etc.). This adds computational cost, so only enable it when you need dynamical correlations — e.g., for spectral functions via [[Analytic Continuation]]. For studies that only require equal-time observables, leave it at the default Ltau = 0.

Global_moves — Ergodicity

Some models benefit from global (space-time) updates in addition to the local sequential or HMC updates. Set Global_moves = .true. and N_Global to the number of global moves per sweep. Whether this is needed depends on the model — for the standard Hubbard model it is not required, but models with discrete symmetries or flat directions in the action may need it. Check acceptance rates in info.

Projector and Theta — Ground-State Algorithm

For ground-state (PQMC) calculations, set Projector = .true. and choose Theta large enough that the projection eΘHe^{-\Theta H} filters out the ground state. Typical values are Theta = 1040 (in units of 1/t1/t). Larger Theta is safer but more expensive (LtrotL_\text{trot} grows). Convergence in Theta should be checked explicitly.

HMC Parameters

For continuous auxiliary fields, the Hybrid Monte Carlo (HMC) update scheme is available via HMC = .true.. The key control parameters are Delta_t_Langevin_HMC, Leapfrog_steps, and N_HMC_sweeps. See [[HMC Parameters]] for details.

HMC auto-tuning is under active development. A procedure to automatically determine efficient values for these hyper-parameters is being worked on.

Analysis: Thermalization and Autocorrelation

ALF does not have a separate warmup phase. All bins are identical from the simulation’s perspective. Thermalization is handled entirely at analysis time via the VAR_errors namelist:

&VAR_errors
n_skip  = 5     ! discard first 5 bins (warmup)
N_rebin = 1     ! rebinning factor
/

n_skip — Discarding Warmup Bins

The first n_skip bins are excluded from the jackknife analysis. For a cold start (no confin), the system needs time to reach equilibrium.

How to choose n_skip: Look at the QMC time series of a scalar observable such as the energy (Ener_scal). The initial transient — where the observable drifts from its starting value toward the equilibrium mean — tells you how many bins to discard. pyALF provides interactive widgets for this; it is highly recommended to visually inspect the time series rather than guessing.

N_rebin — Rebinning for Autocorrelation

N_rebin groups consecutive bins: N_rebin = 2 averages pairs of bins into one before the jackknife. This reduces the effect of autocorrelation between bins.

How to choose N_rebin: Increase N_rebin = 1, 2, 4, ... and watch the error bar. If it stabilizes (plateaus), the bins are uncorrelated at that rebinning level. If the error keeps growing, the bins are still correlated and you need either a larger N_rebin or more NSweep per bin to decorrelate.

pyALF provides widgets to visualize both the time series and the error-bar-vs-rebinning curve, which makes this straightforward.

The number of effective bins after analysis is (total_bins - n_skip) / N_rebin. You need at least 2 effective bins for the jackknife to produce an error estimate.

Restarts and Extending Runs

ALF writes the field configuration to confout at the end of every bin. To continue a run:

  1. Rename the configuration: bash out_to_in.sh (renames confout_*confin_*)

  2. Resubmit the job — ALF detects confin, starts from the saved configuration, and appends new bins to the existing data files

  3. The first bins of the extension are already thermalized (warm start), so n_skip only needs to account for the initial cold-start transient, not the restart

What You Can and Cannot Change Between Runs

When extending a run by appending bins, the new bins are concatenated with the old ones in the data files and analyzed together. This means some parameters must stay fixed, while others can be changed freely:

ParameterCan change?Why
NBinYesEach run simply adds more bins.
NwrapYes (with care)Safe as long as you are not at the stability boundary — changing Nwrap should not change the observable values, only the numerical precision of the Green function.
CPU_MAXYesOnly affects when the run stops.
n_skip, N_rebinYesThese are analysis-time parameters, not simulation parameters.
NSweepNoBins with different NSweep represent averages over different numbers of sweeps. Mixing them in the analysis produces incorrect error estimates. If you need to change NSweep, delete the existing data and start fresh.
Number of MPI ranksNoThe number of MPI workers must remain the same across runs for the same parameter set. Changing it would produce incompatible confout/confin files and data layouts.

Note: It is perfectly fine to use different NSweep values for different parameter sets (e.g., larger NSweep for bigger lattices). The constraint is that all runs contributing bins to the same data file must use the same NSweep and MPI configuration.

Note: The RUNNING lock file prevents accidental concurrent runs in the same directory. If a previous run crashed, you must delete RUNNING manually before restarting.

On HPC clusters, a common pattern is to include bash out_to_in.sh in the job script before the srun command, so resubmission automatically continues from the previous state. See [[Running on Clusters]] for examples.

Checking Your Results

After analysis, inspect these key diagnostics:

What to checkWhereWhat it tells you
Green function precisioninfo: Precision Green Mean, MaxWrapping/stabilization quality. Mean should be <107< 10^{-7}.
Average signSign_scal or HDF5 SignSign 1\ll 1 means exponentially noisy data.
Acceptance ratesinfo: Acceptance, Acceptance_HMCToo low → poor sampling. Too high → steps too small.
QMC time seriesEner_scal bin-by-bin (use pyALF widgets)Thermalization transient, stationarity, obvious outliers.
Error bar vs. rebinningIncrease N_rebin and re-analyzePlateau → bins are uncorrelated. Growing → more NSweep needed.

Quick Reference

ParameterNamelistTypical starting valueKey consideration
DtauVAR_Model_Generic0.1Reduce for large energy scales or poor stability
NwrapVAR_QMC10As large as possible with Green precision <107< 10^{-7}
NSweepVAR_QMC20200Target 1 min – 1 hour per bin for production
NBinVAR_QMC100Extend by rerunning; precision is the criterion
CPU_MAXVAR_QMCJob wall-clock (hours)Prevents data corruption; overrides NBin
LtauVAR_QMC0Set to 1 only when time-displaced data is needed
n_skipVAR_errors520 (cold start)Inspect time series to determine
N_rebinVAR_errors1Increase until error bar plateaus

See Also