.. _Container: ********************************** Containerized Land DA Workflow ********************************** These instructions will help users build and run a basic case for the Unified Forecast System (:term:`UFS`) Land Data Assimilation (DA) System using a `Singularity/Apptainer `_ container. The Land DA :term:`container` packages together the Land DA System with its dependencies (e.g., :term:`spack-stack`, :term:`JEDI`) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth system models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, :term:`MPIs `, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported :ref:`Level 1 ` system (e.g., Hera, Orion). This chapter provides instructions for building and running the Unified Forecast System (:term:`UFS`) Land DA System CADRE sample cases using a container. .. attention:: This chapter of the User's Guide should **only** be used for container builds. For non-container builds, see :numref:`Chapter %s `, which describes the steps for building and running Land DA on a :ref:`Level 1 System ` **without** a container. .. _Prereqs: Prerequisites ***************** The containerized version of Land DA requires: * `Installation of Apptainer `_ (or its predecessor, Singularity) * At least 26 CPU cores (may be possible to run with 13, but this has not been tested) * An **Intel** compiler and :term:`MPI` (available for `free here `_) * The `Rocoto workflow manager `_ * The `Slurm `_ job scheduler Apptainer is preinstalled for users at the CADRE DA training; users do **not** need to install it unless they are attempting to build and run the containerized Land DA System on a different platform. .. _GetDataC: Data *********** .. attention:: Data is pre-staged for the CADRE DA training, and users at the training may skip this section. In order to run the Land DA System, users will need input data in the form of fix files, model forcing files, restart files, and observations for data assimilation. Data for the CADRE DA training are already available on the system used for the training. When attempting to replicate the steps on another system, users will need input data in the form of fix files, model forcing files, restart files, and observations for data assimilation. These files can be downloaded from the `Land DA Data Bucket `_ into the user's directory of choice. In the working directory, run: .. code-block:: console wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/CADRE-2025/Land-DA_v2.1_inputs.tar.gz tar xvfz Land-DA_v2.1_inputs.tar.gz .. _DownloadContainer: Download the Container *********************** .. attention:: The container is pre-staged for the CADRE DA training, so users at the training may skip this section. Users will first need to download the container if it is not already on their system. The container for the CADRE DA training is already available on the system used for the training. When trying to replicate the steps on another system, users will need to download it from the `Land DA Data Bucket `_ into the user's directory of choice. In the chosen directory, run: .. code-block:: console wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/CADRE-2025/ubuntu22.04-intel-landda-cadre25.img This will download a container image named ``ubuntu22.04-intel-landda-cadre25.img``. .. _SetUpContainer: Set Up the Container ********************** Create experiment variables that point to the location of the data (``${LANDDA_INPUTS}``) and the container image (``${img}``): .. code-block:: console export LANDDA_INPUTS=/home/ubuntu/inputs export img=/home/ubuntu/ubuntu22.04-intel-landda-cadre25.img From your working directory, copy the ``setup_container.sh`` script out of the container. .. code-block:: console singularity exec -H $PWD $img cp -r /opt/land-DA_workflow/setup_container.sh . The ``setup_container.sh`` script should now appear in your working directory. Run the ``setup_container.sh`` script with the proper arguments. .. code-block:: console ./setup_container.sh -c=intelmpi/2021.13 -m=intelmpi/2021.13 -i=$img where: * ``-c`` is the compiler on the user's local machine (e.g., ``intelmpi/2021.13``) .. COMMENT previously intel/2022.1.2 * ``-m`` is the :term:`MPI` on the user's local machine (e.g., ``intelmpi/2021.13``) * ``-i`` is the full path to the container image ( e.g., ``/home/ubuntu/ubuntu22.04-intel-landda-cadre25.img``). Running this script will print the following messages to the console: .. code-block:: console Copying out land-DA_workflow from container Checking if LANDDA_INPUTS variable exists and linking to land-DA_workflow Land DA data exists, creating links Updating scripts files Updating singularity modulefiles Updating run related scripts Setup conda Getting the jedi test data from container Update experiment variables Creating links for exe Done The user should now see the ``land-DA_workflow`` and ``jedi-bundle`` directories in their working directory. Containers come with pre-built executables, so users may continue to the next section to configure the experiment. However, users who are interested in learning how to build the executables can skip to :numref:`Section %s ` to learn how to build their own executables to use in their experiment. .. _ConfigureExptC: Configure the Experiment ************************** To configure an experiment, first load the workflow modulefiles for the container: .. code-block:: console cd land-DA_workflow module use modulefiles module load wflow_singularity Then navigate to the ``parm`` directory and copy the desired case into ``config.yaml``: .. code-block:: console cd parm cp config_samples/samples_cadre/.yaml config.yaml where ``.yaml`` is the name of one of the sample case files in the `samples_cadre `_ directory. For example, when running the **cadre1** case, run: .. code-block:: console cd parm cp config_samples/samples_cadre/cadre1_config.LND.era5.3dvar.ims.warmstart.yaml config.yaml Modify variables in ``config.yaml`` as needed. For example, in **cadre1**, the Gulf Coast Blizzard hit the Gulf Coast late on January 20, 2025 and left land by January 23, 2025. To reduce the duration of the default forecast and save computational resources, users can change ``DATE_LAST_CYCLE`` to from January 25 to January 22 (``2025012200``): .. code-block:: console ACCOUNT: epic APP: LND ATMOS_FORC: era5 COLDSTART: 'NO' COUPLER_CALENDAR: 2 DATE_CYCLE_FREQ_HR: 24 DATE_FIRST_CYCLE: 2025011900 DATE_LAST_CYCLE: 2025012200 ... Users may configure other elements of an experiment in ``config.yaml`` if desired. For example, users may wish to choose a different ``EXP_CASE_NAME``or DA algorithm (via the ``JEDI_ALGORITHM`` variable). Users who wish to run a more complex experiment may change the values in ``config.yaml`` using information from Section :numref:`%s: Workflow Configuration Parameters `. Generate the experiment directory by running: .. code-block:: console ./setup_wflow_env.py -p=singularity If the command runs without issue, this script will print override messages, experiment details, and "0 errors found" messages to the console, similar to the following excerpts: .. code-block:: console ubuntu@ip-10-29-93-226:~/land-DA_workflow/parm$ ./setup_wflow_env.py -p=singularity Python Log Level= str: INFO, attr: 20 INFO::/home/ubuntu/land-DA_workflow/parm/./setup_wflow_env.py::L34:: Current directory (PARMdir): /home/ubuntu/land-DA_workflow/parm INFO::/home/ubuntu/land-DA_workflow/parm/./setup_wflow_env.py::L36:: Home directory (HOMEdir): /home/ubuntu/land-DA_workflow INFO::/home/ubuntu/land-DA_workflow/parm/./setup_wflow_env.py::L38:: Experimental base directory (exp_basedir): /home/ubuntu INFO::/home/ubuntu/land-DA_workflow/parm/./setup_wflow_env.py::L168:: Experimental case directory /home/ubuntu/exp_case/cadre1_lnd_era5_ims has been created. INFO::/home/ubuntu/land-DA_workflow/parm/./setup_wflow_env.py::L175:: Rocoto YAML template: /home/ubuntu/land-DA_workflow/parm/templates/template.land_analysis.yaml ************************************************** Overriding ACCOUNT = epic Overriding APP = LND Overriding ATMOS_FORC = era5 ... Overriding queue_default = batch Overriding res_p1 = 97 ************************************************** model_ver: v2.1.0 IMO: 384 FRAC_GRID: NO NPROCS_FCST_IC: 36 OUTPUT_FH: 1 -1 DATE_FIRST_CYCLE: 2025012000 ... LND_CALC_SNET: .true. ACCOUNT: epic KEEPDATA: YES INFO::/home/ubuntu/land-DA_workflow/sorc/conda/envs/land_da/lib/python3.12/site-packages/uwtools/config/validator.py::L76::0 schema-validation errors found in Rocoto config INFO::/home/ubuntu/land-DA_workflow/sorc/conda/envs/land_da/lib/python3.12/site-packages/uwtools/rocoto.py::L66::0 Rocoto XML validation errors found ATML Configurations Only ========================== For :term:`ATML` configurations only (e.g., ``cadre3``), users must modify the ``run_container_executable.sh`` script using a code editor of their choice. For example: .. code-block:: console vim run_container_executable.sh Uncomment the second-to-last line of the script, which adds the executables to the container by exporting the ``SINGULARITYENV_PREPEND_PATH`` variable: .. code-block:: console # Uncomment the line below when running the ATML experiment export SINGULARITYENV_PREPEND_PATH=/home/ubuntu/land-DA_workflow/sorc/build/bin:$SINGULARITYENV_PREPEND_PATH ${SINGULARITYBIN} exec -B $BINDDIR:$BINDDIR -B $CONTAINERBASE:$CONTAINERBASE $INPUTBIND $img $cmd $arg .. hint:: When using ``vim``, hit the ``i`` key to enter insert mode and make any changes required. To close and save, hit the ``esc`` key and type ``:wq`` to write the changes to the file and exit/quit the file. Users may opt to use their preferred code editor instead. .. _RunExptC: Run the Experiment ******************** To run the experiment, users may submit tasks manually via ``rocotorun`` or use a script to automate submission. .. _WflowOverviewC: Workflow Overview ================== .. include:: ../doc-snippets/wflow-task-table.rst .. _automated-run: Automated Run ================== To submit jobs automatically, users should navigate to the experiment directory, download the ``run_expt.sh`` script, modify permissions, and run the script: .. code-block:: console cd /home/ubuntu/exp_case/ wget https://raw.githubusercontent.com/NOAA-EPIC/CADRE-DA-training/refs/heads/main/Day2/run_expt.sh . chmod 755 run_expt.sh ./run_expt.sh where ```` is replaced with the actual name of the experiment directory (e.g., ``cadre1_lnd_era5_ims``). To check the status of the experiment, see :numref:`Section %s ` on tracking experiment progress. .. _manual-run-c: Manual Submission ================== To run the experiment manually, navigate to the experiment directory and issue a ``rocotorun`` command. For example: .. code-block:: console cd ../../exp_case/cadre1_lnd_era5_ims rocotorun -w land_analysis.xml -d land_analysis.db Users will need to issue the ``rocotorun`` command multiple times. The tasks must be run in order, and ``rocotorun`` initiates the next task once its dependencies have completed successfully. See the :ref:`Workflow Overview ` section to learn more about the steps in the workflow process. .. _TrackProgressC: Track Progress ================ To check on the job status, users on a system with a Slurm job scheduler may run: .. code-block:: console squeue -u $USER To view the experiment status, run: .. code-block:: console rocotostat -w land_analysis.xml -d land_analysis.db If ``rocotorun`` was successful, the ``rocotostat`` command will print a status report to the console. For example: .. code-block:: console CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION =================================================================================================== 202501190000 jcb 1 SUCCEEDED 0 1 16.0 202501190000 prep_data 2 SUCCEEDED 0 1 42.0 202501190000 pre_anal 3 SUCCEEDED 0 1 17.0 202501190000 analysis 7 SUCCEEDED 0 1 80.0 202501190000 post_anal 8 SUCCEEDED 0 1 4.0 202501190000 forecast druby://10.29.93.209:38153 SUBMITTING - 0 0 202501190000 plot_stats - - - - - =================================================================================================== 202501200000 jcb 4 SUCCEEDED 0 1 16.0 202501200000 prep_data - - - - - 202501200000 pre_anal - - - - - 202501200000 analysis - - - - - 202501200000 post_anal - - - - - 202501200000 forecast - - - - - 202501200000 plot_stats - - - - - =================================================================================================== 202501210000 jcb 5 SUCCEEDED 0 1 16.0 202501210000 prep_data - - - - - 202501210000 pre_anal - - - - - 202501210000 analysis - - - - - 202501210000 post_anal - - - - - 202501210000 forecast - - - - - 202501210000 plot_stats - - - - - =================================================================================================== 202501220000 jcb 6 SUCCEEDED 0 1 16.0 202501220000 prep_data - - - - - 202501220000 pre_anal - - - - - 202501220000 analysis - - - - - 202501220000 post_anal - - - - - 202501220000 forecast - - - - - 202501220000 plot_stats - - - - - Note that the status table printed by ``rocotostat`` only updates after each ``rocotorun`` command (whether issued manually or automatically). For each task, a log file is generated. These files are stored in ``/home/ubuntu/ptmp/test_*/com/output/logs``. The experiment has successfully completed when all tasks say SUCCEEDED under STATE. Other potential statuses are: QUEUED, SUBMITTING, RUNNING, DEAD, and UNAVAILABLE. Users may view the log files to determine why a task may have failed. .. _check-output-c: Check Experiment Output ========================= .. include:: ../doc-snippets/check-output.rst .. COMMENT: ref to LANDDAROOT in this snippet - factor out? reword? .. _plotting-c: Plotting Results ------------------ Additionally, in the ``plot`` subdirectory, users will find images depicting the results of the ``analysis`` task for each cycle as a scatter plot (``hofx_oma_YYYYMMDD_scatter.png``) and as a histogram (``hofx_oma_YYYYMMDD_histogram.png``). The scatter plot is named OBS-BKG (i.e., Observation Minus Background [OMB]), and it depicts a map of snow depth results. Blue points indicate locations where the observed values are less than the background values, and red points indicate locations where the observed values are greater than the background values. The title lists the mean and standard deviation of the absolute value of the OMB values. The histogram plots OMB values on the x-axis and frequency density values on the y-axis. The title of the histogram lists the mean and standard deviation of the real value of the OMB values. .. |logo1| image:: https://raw.githubusercontent.com/wiki/ufs-community/land-DA_workflow/images/LandDAScatterPlot.png :alt: Map of snow depth in millimeters (observation minus analysis) .. |logo2| image:: https://raw.githubusercontent.com/wiki/ufs-community/land-DA_workflow/images/LandDAHistogram.png :alt: Histogram of snow depth in millimeters (observation minus analysis) on the x-axis and frequency density on the y-axis .. list-table:: Snow Depth Plots for 2000-01-04 * - |logo1| - |logo2| Downloading the Plots ^^^^^^^^^^^^^^^^^^^^^^^ .. note:: There are many options for viewing plots, and instructions for this are highly machine dependent. Users should view the data transfer documentation for their system to secure copy files from a remote system (such as :term:`RDHPCS`) to their local system. The instructions provided here apply to the Land DA training platform and may not be relevant on other platforms. #. Open a new terminal window. #. Type ``bash`` to ensure a bash shell. #. Add your private key (e.g., ``ssh-add ~/.ssh/id_ed25519_student1``). #. For each directory of plots, run: .. code-block:: console rsync -v --rsh "ssh student#@137.75.93.46 ssh" ubuntu@controller:/home/ubuntu/exp_case/cadre1_lnd_era5_ims/com_dir/landda.202501##/plot/* plots/202501## In the command, replace: * ``student#`` with your actual student number, * ``landda.202501##`` with the cycle date, and * ``plots/202501##/`` with the correct cycle date. This will create a ``plots`` directory and cycle subdirectory in your current working directory and download the plots. Appendix ********** .. _build-exe: Building the Executables ========================== The executables come pre-built in the Land DA Container. However, users who are curious about building the executables using the ``app_build.sh`` script can follow the instructions here. #. Shell into the container. .. code-block:: console singularity shell -B /home:/home /home/ubuntu/ubuntu22.04-intel-landda-cadre25.img #. Go to the ``land-DA_workflow`` directory in the container. .. code-block:: console cd /home/ubuntu/land-DA_workflow/sorc #. Set up the environment by sourcing the container's spack-stack installation and loading the container modulefiles. .. code-block:: console source /opt/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/.bashenv-fms module use ../modulefiles module load build_singularity_intel #. Build the model using ``app_build.sh``. Users must select either the :term:`ATML` configuration (``-a=ATML``) or the :term:`LND` configuration when building. Users indicate that the platform (``-p``) is a container using the ``-p=singularity`` argument. Conda was pre-built in previous steps, so users should include the ``--conda=off`` argument to avoid rebuilding it. The ``--build`` option keeps the executables in the ``build`` directory under ``bin``. .. code-block:: console # Build ATML configuration (Noah-MP + FV3) ./app_build.sh -p=singularity -a=ATML --conda=off --build # Build LND configuration (Noah-MP + DATM) ./app_build.sh -p=singularity --conda=off --build .. note:: The ``parm/run_container_executable.sh`` script looks for the executables built by the ``app_build.sh`` script. If users decide not to use this script to build the ATML exectuables, then the ``run_container_executable.sh`` script will need to point to the location of the prebuilt executables: * Pre-built LND executable: ``/opt/land-DA_workflow/install/bin`` * Pre-built ATML executable: ``/opt/land-DA_workflow/sorc/build-atml/bin/``. After building the executables, type ``exit`` and continue to :numref:`Section %s: Configure the Experiment `.