Release Notes: HIRESW v8.1.0 - released to NCO on October 20, 2021. Transition from WCOSS Cray to WCOSS2. The google docs link to the source of this document (better than the .txt version included in the repository): https://docs.google.com/document/d/1lPV6RkstV8lQfET7gzlrsN9QccIAD3aHgukb3ltJBuk/edit?usp=sharing ####################################### Obtaining and installing ####################################### git clone -b v8.1_deliver https://github.com/MatthewPyle-NOAA/hiresw.git target_dir Expectation is that target_dir will be hiresw.v8.1.0 (the clone will create this target_dir to clone into). Assuming it has been cloned into hiresw.v8.1.0, cd into hiresw.v8.1.0. Installation directions: 1. Build the executables cd sorc ./build_all.sh Within this overall “build_all.sh” script there are steps to check out an external code (for the FV3), and a copying of fix files for both ARW and FV3 into this vertical structure. The actual build work is being done by the arw/build_hiresw.sh script and fv3/build_all.sh scripts. The BUILD_* variables near the top of the arw/build_hiresw.sh script allow for selecting which codes to build. The regional_build.cfg file in sorc/fv3 allows for specifying which components are compiled for the FV3 build. 2. Install the executables for both ARW and FV3 ./install_all.sh Again, the actual work is done by model specific scripts, in this case arw/install_hiresw.sh script and fv3/install_all.sh. The arw/install_hiresw.sh script has INSTALL_* variables corresponding to the BUILD_* variables in arw/build_hiresw.sh in case there is a desire to only copy select executables to the final executable space. Executing the script copies the executables to the proper exec/arw/ space. There should be 11 arw executables and 10 fv3 executables. [Matthew.Pyle@clogin04 exec]$ ls arw hiresw_bucket hiresw_smartprecipg2 hiresw_wps_metgrid hiresw_wrfarwfcst_init hiresw_post hiresw_sndp hiresw_wps_ungrib hiresw_wrfbufr hiresw_smartinitg2 hiresw_stnmlist hiresw_wrfarwfcst [Matthew.Pyle@clogin04 exec]$ ls fv3 hireswfv3_bucket hireswfv3_forecast hireswfv3_smartinit hireswfv3_stnmlist hireswfv3_bufr hireswfv3_fv3snowbucket hireswfv3_smartprecip hireswfv3_chgres_cube hireswfv3_post hireswfv3_sndp ####################################### Software utilized from outside of vertical structure ####################################### PrgEnv-intel/8.1.0 intel/19.1.3.304 craype/2.7.6 cray-mpich/8.1.4 ip_ver=3.3.3 bacio_ver=2.4.1 w3nco_ver=2.4.1 w3emc_ver=2.7.3 g2_ver=3.4.5 g2tmpl_ver=1.10.0 jasper_ver=2.0.25 libpng_ver=1.6.37 zlib_ver=1.2.11 nemsio_ver=2.5.2 bufr_ver=11.5.0 sfcio_ver=1.4.1 sp_ver=2.3.3 netcdf_ver=4.7.4 sigio_ver=2.3.2 gfsio_ver=1.4.1 crtm_ver=2.3.0 wrf_io_ver=1.1.1 esmf_ver=8.1.1 cmake_ver=3.18.4 python_ver=3.8.6 hdf5_ver=1.10.6 libxmlparse_ver=2.0.0 gempak_ver=7.14.0 cfp_ver=2.0.4 prod_util_ver=2.0.9 prod_envir_ver=2.0.5 libjpeg_ver=9c wgrib2_ver=2.0.8_wmo obsproc_shared_bufr_cword_ver=v1.1.0 (used the canned v1.0.1 version in testing) ####################################### Description of code/script changes ####################################### The attempt here is to summarize all changes made - most were just to get things to compile and run on WCOSS2, but others to resolve bugs that were causing some intermittent failures on WCOSS2 or clean out unused items. Other changes were made to adapt to the new WCOSS2 structure and rules. jobs/: All J-jobs are modified. Included in the changes are: * COMIN/COMOUT changes adapting to new filesystem organization, and use of compath.py in definitions. * changes from aprun to mpiexec where needed. * elimination of RUN_ENVIR usage. * switches NWROOT to PACKAGEROOT * adapts to elimination of .ecf ending to ex-scripts. * eliminates posting to jlogfile * Elimination of some unused environmental variables, particularly in JHIRESW_MAKE_BC and JHIRESW_MAKE_IC scripts/arw/: Eliminates the .ecf terminations from the scripts. Also changes from aprun to mpiexec where needed, and changes in how “poe” MPMD-type scripts are generated. Eliminates posting to jlogfile. Eliminates or comments out certain old NMMB-related content that is no longer used. Removes the domain-specific task information in exhiresw_fcst.sh. exhiresw_post.sh now looks for the postdone?? file in ../, as the DATA being defined in JHIRESW_POST includes the forecast hour. It doesn’t seem like the current operational version would be able to restart properly in midstream. Eliminates a lot of nmmb model checks and code blocks in exhiresw_prelim_rap.sh scripts/fv3/: Eliminates the .ecf terminations from the scripts. Also changes from aprun to mpiexec where needed, and with changes to how “poe” scripts are generated. Eliminates posting to jlogfile. exhiresw_make_bc.sh is also changed to deal with an overhaul in how the MAKE_BC job is submitted (now being submitted as a separate parallel job for each time being processed - in line with how JHIRESW_POST and JHIRESW_PRDGEN are triggered). Changes to adapt to the COMINgfs passed in (slightly different than in Hiresw v8.0.5), and also is designed so the working directory is the subdirectory for the hour being processed (previously the script created these subdirectories within the script). exhiresw_post.sh now looks for the postdone?? file in ../, as the DATA being defined in JHIRESW_POST includes the forecast hour. ush/arw: Eliminates some jlogfile writes, and also switches from aprun to mpiexec in a few places. ush/fv3: Switches from aprun to mpiexec in a few places. sorc/arw: hiresw_wps.fd included changes to the ungrib source to get it to compile (made certain codes more f90 like with continuation lines). Also a change so it would recognize that it was processing a GRIB2 input file. hiresw_post.fd had many codes changed, both for continuation lines (particularly for !omp lines), but also to adapt to a newer version of the CRTM library - a safe change since no satellite lookalike products are generated for HiresW. hiresw_bucket.fd was updated to resolve an out of array bounds issue that was leading to sporadic failures. hiresw_smartinitg2.fd - eliminates log prints of association and allocation status of a GRIB2 gfld structure, and also use of an undefined variable. hiresw_smartprecipg2.fd - fixes use of undefined variables that caused sporadic problems. sorc/fv3: hireswfv3_bucket.fd was updated to resolve using an out of bound array reference that led to sporadic failures. hireswfv3_post.fd eliminated the jobs, parm, scripts, and ush subdirectories that aren’t being utilized. hireswfv3_smartinit.fd changed some diagnostic prints, and also modified an allocation statement (to only allocate if not previously allocated), and eliminated some undefined variables. hireswfv3_smartprecip.fd - some changes in log output from standard out to standard error, cleans out some legacy GRIB1 stuff that no longer is used, and cleaned up some out of array bounds references and undefined variables that were causing random failures. hireswfv3_forecast.fd - changes for the WCOSS2 build, and also a one-line code modification to handle an updated version of ESMF. parm/arw: Model task geometry changes in the hiresw_arw_namelist*_model files. Removes a blank line at the end of hiresw_awpreg.txt_4 that caused problems on WCOSS2. Brings in a new version of hiresw_params_grib2_tbl_new for the newer g2tmpl version being used on WCOSS2. parm/fv3: Removes a blank line at the end of hiresw_awpreg.txt_4 that caused problems on WCOSS2. Generalizes the write_groups and write_tasks_per_group settings in the model_configure_fv3* files for guam, hi, and pr (previously was hardwired). The run_commands*.config files are changed to shift from aprun to mpiexec. versions/: New directory. Adds build.ver and run.ver files launch_info/: New submission scripts for PBS that also contain resource information for ARW runs. The "launch_cray_offset_retro" script contains some domain-specific information, and passes the domain, cycle, model core, and date information as well as needed resource information into scripts that get submitted. The sub*_in_cray_retro are the PBS template submission jobs that get modified by launch_cray_offset_retro. launch_fv3/: New submission scripts for PBS that also contain resource information for FV3 runs - created to run without rocoto. It is organized a bit differently than launch_info. For the FV3, each job has a unique launch*.sh script that is used to provide the domain, cycle, model core, and date information, as well as needed resource information, into the final scripts that get submitted. The sub*_in_cray_retro are the PBS template submission jobs that get modified by their corresponding launch*.sh script. A crude “combo.sh” script submits all jobs in sequence. ####################################### Product and resource changes ####################################### Aside from the reorganization on disk, there are no changes to product names or details. Also no changes are made to product dissemination or to what is saved to HPSS. The launch_info and launch_fv3 domains described above have definitions for node/task usage. Also, a spreadsheet showing some timing and resource comparisons for the different HIresW domains (primarily model jobs) is available in: https://docs.google.com/spreadsheets/d/1Xz7M0mRKL0FP9HzDOycImHJkZzfgkKxo4kad7415PUA/edit?usp=sharing Task counts for the ARW forecast model were slightly increased to keep run times within +/- 5 minutes of current operations. Disk space needed per day should be unchanged - about 1230 GB/day (for the hiresw.${PDY} contents, including the wmo/ and gempak/ subdirectories). ####################################### Additional information ####################################### The upstream and downstream dependencies - should be the same as for HiresW v8.0: https://docs.google.com/document/d/1PX1mlW1ApIqjJiCu7FRT-o-X8cotesF2yP2PwxMSisM/edit?usp=sharing A copy of the COMOUT directory from final testing: /lfs/h2/emc/lam/noscrub/Matthew.Pyle/test/com/hiresw/v8.1/hiresw.20210824 (on Cactus) Log files (stderr and stdout combined) are under /lfs/h2/emc/lam/noscrub/Matthew.Pyle/test/com/hiresw/v8.1/logs/arw/ and /lfs/h2/emc/lam/noscrub/Matthew.Pyle/test/com/hiresw/v8.1/logs/fv3/ A summary document including some graphical examples comparing operational runs from WCOSS-Cray and runs made on WCOSS2, and some timing summaries: https://docs.google.com/presentation/d/1MwsSlqfmGceHZlL-YXu6h9YV3I2msI7edprscQa7hIs/edit?usp=sharing