#PBS -N hwrf%STORMNUM%_products_%CYC%
#PBS -j oe
#PBS -S /bin/bash
#PBS -q %QUEUE%
#PBS -A %PROJ%-%PROJENVIR%
#PBS -l walltime=03:00:00
#PBS -l select=1:ncpus=48:mpiprocs=48:mem=18G
#PBS -l debug=true
export NODES=1
export TOTAL_TASKS=48
export NHC_PRODUCTS_NTHREADS=24
model=hwrf
export cyc="%CYC%"
%include
%include
export cyc="%CYC%"
export storm_num="%STORMNUM%"
# versions file for hwrf sets $model_ver and $code_ver
module load envvar/${envvar_ver}
module load PrgEnv-intel/${PrgEnv_intel_ver}
module load craype/${craype_ver}
module load intel/${intel_ver}
module load cray-pals/${cray_pals_ver}
module load libjpeg/${libjpeg_ver}
module load grib_util/${grib_util_ver}
module load wgrib2/${wgrib2_ver}
module load bufr/${bufr_ver}
module load hdf5/${hdf5_ver}
module load netcdf/${netcdf_ver}
# module load pnetcdf/${pnetcdf_ver}
module load udunits/${udunits_ver}
module load nco/${nco_ver}
module load python/${python_ver}
module load cfp/${cfp_ver}
module list
${HOMEhwrf}/jobs/JHWRF_PRODUCTS
%include
%manual
TASK products
PURPOSE: Runs in parallel with the rest forecast/model, converting
native grid WRF data to products useful to the forecasters and public.
Delivers native files to COM as requred. Runs the tracker, dbn_alert,
delivers products.
Meters:
gribber - last forecast hour completed that has completed all GRIB2
generation, delivery and alerting.
tracker - last forecast hour the tracker has completed. Forecast
hours past the end of the track are still counted in this bar.
Events:
SentTrackToNHC - set immediately after the track file has been
delivered to NHC areas.
DETAILS: In short, this is the main delivery job for the HWRF system.
This job runs in multiple threads that work together via an sqlite3
database system. It has restart capability, which means it will start
where it left off if it is killed and restarted. That means if you
want it to start from the beginning, you MUST first run the unpost
job. On the other hand, if the job died from a technical problem
(downed node, fire, etc.) then requeueing the job will cause it to
start where it left off. The only exception is the tracker component,
which always starts from the beginning.
The last step of this job is to run a special threaded OpenMP program
that reads in several custom input files, and generates the
*.swath.grb2 file, various NHC custom files, and the AFOS file. THe
AFOS file is emailed to the SDM at the end of the job.
TROUBLESHOOTING
Most failures of this job fall in two categories:
- post job failed
- operator error
- system issues
If this job failed, check the post1 and post2 first. If the post1/2
jobs are stuck or failed, that is why the products job failed.
What do I mean by "operator error?"
* ALWAYS KILL AND REQUEUE THE ENTIRE forecast FAMILY to rerun the
rerun the forecast model. Never, under ANY circumstances, rerun
just the model!
* If you need to rerun the post or products, KILL AND REQUEUE THE
REQUEUE THE ENTIRE FAMILY so that the unpost runs first.
Either category of issues, system or operator, has caused a wide
variety of interesting problems in the post and products jobs.
The 2016 upgrade has been changed to immediately exit the post and
products jobs at even the smallest sign of error instead of retrying
operations. However, if you forget to requeue the entire post family,
ultimately the system cannot do much to overcome that operator error.
%end