#### UNIX Script Documentation Block # # Script name: ingest_process_onetype_neworbits # # JIF contact: Keyser org: NP22 date: 2012-06-26 # # Abstract: Determines the existence of yet-to-be processed files in one family # of time-stamped files from a remote unix machine. Processing is initiated # if it is determined that any such files exist. # # Script history log: # 1996-10-03 Bert Katz Original version for implementation. # 1996-11-29 Bert Katz Unified the output file. # 1997-01-09 Bert Katz Unified output goes to "stdout", refined neworbit # determination, added debug option. # 1997-03-13 Bert Katz Added safeguards to keep simultaneous executions from # stepping on each others' toes. # 1997-07-10 Bert Katz Corrected the code which determines the varying part # of the filename for the case in which only one file exists on the CEMSCS. # 1997-10-14 Bert Katz Modified to handle multiple input file families. # 1997-12-17 Bert Katz Changed return code 199 to 254 so as not to conflict # with return code from process_orbits. Made construction of the "newlist" # file more reliable by starting a new "newlist" file with the first file # group and then concatenating the remaining file groups (if any) onto the # "newlist" file. # 1998-05-08 Bert Katz Simplified script by removing array notation. # 1998-07-20 Bert Katz Added option to concatenate or correlate multiple # input file families. Simplified sorting procedure by eliminating the # determination of the varying portion of the filename: it took more time # than it saved. Added a sort for the newlist file: necessary for multiple # input file families. # 1999-05-11 Bert Katz Changed criteria for exiting so that non-existence of # files for a single query no longer causes processing to terminate. # Processing will only terminate if all queries fail to find files to # process. This will allow processing to continue upon concatenated # families of of files. # 2003-05-30 L Sager Added scripting to drop any new orbits which have # been migrated from CEMSCS. # 2006-02-23 D. Keyser Modified to remove processing which used ftp to check # for and remove migrated AIRS files (including center f-o-v AIRS, warmest # f-o-v AIRS, and AMSR-E), one at a time. This is no longer needed because # script ingest_cemscsquery now removes migrated files (prior to this) in a # more efficient manner. This change will allow the parent AIRS ingest job # to run much faster, especially when there are a lot of migrated files to # filter out. Improved script Docblock. # 2006-05-12 D. Keyser Combines/generalizes this script (previously for # CEMSCS machine only) and script ingest_unixproc_onetype_neworbits (for # unix machines only). Changed to account for all remote machines now # being unix (since MVS CEMSCS machine was replaced with unix DDS machine). # Improved documentation and comments, more appropriate messages posted to # joblog. # 2007-05-14 D. Keyser Now uses imported script variable IFILES_MAX_GET to # determine the maximum number of new files on the remote machine for which # transferring and processing will occur, anything above this results in no # file processing (had been unlimited except for special case of ingest # processing executed from RUC2A dump jobs where it was hardwired to "8" - # this special case has now been removed) # 2010-07-06 D. Keyser Will not expand filenames which include substitution # characters "?" and "*" in case files being sftp'd are on same CCS machine # as that in which the job is running. # 2012-06-26 D. Keyser Always sorts newlist file, not just for multiple # input file families as before. The "ls" command on some new remote # machines does not return the listing of files in alphabetical order # (e.g., trmmrt.gsfc.nasa.gov). # # Usage: ingest_process_onetype_neworbits # # Script parameters: none # # Modules and files referenced: # scripts : $USHbufr/ingest_query # $USHbufr/ingest_process_orbits # $DATA/postmsg # data cards : none # executables: none # # Remarks: Invoked by the model script existore.sh.sms. # # Condition codes: # 0 - no problem encountered # > 0 - some problem encountered # Specifically: 1 - Query failed to produce any files to process # 2 - The number of new files for this family exceeds the # limit (imported variable IFILES_MAX_GET) # 111 - All files to be processed in a particular group # (family) were unprocessable # 199 - One or more files were untransferable in last 5 (or # possibly some other number) runs of this job # 220 - No action specified for multiple file families # 222 - All files to be processed were already processed # 230 - No files submitted for processing # 254 - Failure in sort of list of files to be processed or # in sort of files in history file # # Attributes: # Language: aix unix # Machine: NCEP CCS #### set -au echo echo "#######################################################################" echo " START INGEST_PROCESS_ONETYPE_NEWORBITS " echo "#######################################################################" echo echo echo "Processing file group $REMOTEDSNGRP" echo "Start time is $(date)." echo if [ $DEBUGSCRIPTS = ON -o $DEBUGSCRIPTS = YES ] ; then set -x fi if [ -s $ORBITLIST.newlist$$ ] ; then rm $ORBITLIST.newlist$$ fi nfam=0 REMDSNGRP="" unset CONCATCORREL REMOTEDSNGRP_save="$REMOTEDSNGRP" # Loop through each DSN family listed in $REMOTEDSNGRP to get a list of new # files to process # ------------------------------------------------------------------------- echo "$REMOTEDSNGRP_save" | read firstword restofstring while [[ "$firstword" != "" ]] ; do REMOTEDSNGRP_save="$restofstring" eval DIRDSNFAM=\$firstword # See if the words concatenate_families or correlate_families are in the list # of groups # --------------------------------------------------------------------------- CONCORTEST=$(echo "$DIRDSNFAM" | tr [A-Z] [a-z]) if [ $CONCORTEST = concatenate_families -o \ $CONCORTEST = correlate_families ] ; then CONCATCORREL=$CONCORTEST echo "$REMOTEDSNGRP_save" | read firstword restofstring continue elif [ $CONCORTEST = concat -o $CONCORTEST = concatenate -o \ $CONCORTEST = concat_fams -o $CONCORTEST = concat_families -o \ $CONCORTEST = concatenate_fams ] ; then CONCATCORREL=concatenate_families echo "$REMOTEDSNGRP_save" | read firstword restofstring continue elif [ $CONCORTEST = correl -o $CONCORTEST = correlate -o \ $CONCORTEST = correl_fams -o $CONCORTEST = correl_families -o \ $CONCORTEST = correlate_fams ] ; then CONCATCORREL=correlate_families echo "$REMOTEDSNGRP_save" | read firstword restofstring continue fi nfam=$(($nfam+1)) REMDSNGRP="${REMDSNGRP}$DIRDSNFAM " DSNFAM=$(basename "$DIRDSNFAM") # Get listing of datasets ( beginning with $DSNFAM ) available on remote unix # machine $MACHINE - dataset listing is returned in file $DSNFAM.newlist$$ # --------------------------------------------------------------------------- sh $USHbufr/ingest_query $MACHINE $DSNFAM.newlist$$ "$DIRDSNFAM" ftperror=$? set +x echo echo "Time is now $(date)." echo [ $DEBUGSCRIPTS = ON -o $DEBUGSCRIPTS = YES ] && set -x if [ $ftperror -ne 0 ] || [ ! -s $DSNFAM.newlist$$ ]; then set +x echo echo "No $DIRDSNFAM files located on remote unix machine $MACHINE." echo [ $DEBUGSCRIPTS = ON -o $DEBUGSCRIPTS = YES ] && set -x fi # Save list of multiple families of files to same file # ---------------------------------------------------- if [ -s $ORBITLIST.newlist$$ ] ; then cat $DSNFAM.newlist$$ >> $ORBITLIST.newlist$$ rm $DSNFAM.newlist$$ else mv $DSNFAM.newlist$$ $ORBITLIST.newlist$$ fi echo "$REMOTEDSNGRP_save" | read firstword restofstring done if [ ! -s $ORBITLIST.newlist$$ ] ; then set +x echo echo "Exiting - query failed to produce any files to process." echo "Ending time for ingest_process_onetype_neworbits is $(date)." echo exit 1 fi # Sort list of available files # ---------------------------- sort -d -o $ORBITLIST.tempsort$$ $ORBITLIST.newlist$$ if [ $? -eq 0 ] ; then awk ' { print $1 } ' $ORBITLIST.tempsort$$ > $ORBITLIST.newlist$$ rm $ORBITLIST.tempsort$$ else set +x echo echo "Exiting - failure in sort of list of files to be processed." echo "Ending time for ingest_process_onetype_neworbits is $(date)." exit 254 fi # Sort list of files in history file # ---------------------------------- if [ -s $ORBITLIST ] ; then sort -d -o $ORBITLIST.tempsort$$ $ORBITLIST if [ $? -eq 0 ] ; then awk ' { print $1 } ' $ORBITLIST.tempsort$$ > $ORBITLIST rm $ORBITLIST.tempsort$$ else set + echo echo "Exiting - failure in sort of list of files in history file." echo "Ending time for ingest_process_onetype_neworbits is $(date)." echo exit 254 fi # Determine if there are any new files to process, and update history file # ------------------------------------------------------------------------ comm -13 $ORBITLIST $ORBITLIST.newlist$$ > $ORBITLIST.neworbits$$ comm -23 $ORBITLIST $ORBITLIST.newlist$$ > $ORBITLIST.oldorbits$$ if [ -s $ORBITLIST.oldorbits$$ ] ; then set +x echo echo "The following time-stamped files have been deleted on remote unix \ machine $MACHINE :" cat $ORBITLIST.oldorbits$$ echo [ $DEBUGSCRIPTS = ON -o $DEBUGSCRIPTS = YES ] && set -x comm -12 $ORBITLIST $ORBITLIST.newlist$$ > $ORBITLIST.remaining$$ mv $ORBITLIST.remaining$$ $ORBITLIST fi rm $ORBITLIST.newlist$$ $ORBITLIST.oldorbits$$ if [ ! -s $ORBITLIST.neworbits$$ ] ; then msg=" No new time-stamped files to process on remote unix machine \ $MACHINE." set +x echo echo $msg echo [ $DEBUGSCRIPTS = ON -o $DEBUGSCRIPTS = YES ] && set -x $DATA/postmsg "$jlogfile" "$msg" rm $ORBITLIST.neworbits$$ set +x echo echo "Ending time for ingest_process_onetype_neworbits is $(date)." echo exit 0 fi else mv $ORBITLIST.newlist$$ $ORBITLIST.neworbits$$ fi # Process new files # ----------------- set +x echo echo "Time is now $(date)." echo echo "The following time-stamped files have been provided for processing on \ remote unix machine $MACHINE :" cat $ORBITLIST.neworbits$$ echo [ $DEBUGSCRIPTS = ON -o $DEBUGSCRIPTS = YES ] && set -x num_files=$(wc -l < $ORBITLIST.neworbits$$) if [ $num_files -gt $IFILES_MAX_GET ]; then ###cwd=`pwd` ###cd $DATA msg="Exiting w/ rc=2 - The number of new files for this family, $num_files,\ exceeds the limit of $IFILES_MAX_GET - no processing is done --> non-fatal" $DATA/postmsg "$jlogfile" "$msg" ###cd $cwd rm $ORBITLIST.neworbits$$ set +x echo echo $msg echo "Ending time is $(date)." echo exit 2 fi if [ $nfam -gt 1 ] ; then CONCATCORREL=${CONCATCORREL:-correlate_families} else CONCATCORREL=${CONCATCORREL:-concatenate_families} fi if [ $CONCATCORREL = correlate_families ] ; then awk -v REMDSNFAMS="$REMDSNGRP" ' BEGIN { nfamily=split(REMDSNFAMS,remdsnfams," "); for(i=1;i<=nfamily;i=i+1) { lenfam[i]=length(remdsnfams[i]); numfam[i]=0 } } { for(i=1;i<=nfamily;i=i+1) if(index($1,remdsnfams[i])!=0) { numfam[i]=numfam[i]+1; tail[i,numfam[i]]=substr($1,lenfam[i]+1,length($1)-lenfam[i]); lines[i,numfam[i]]=$0 } } END { for(i=1;i<=numfam[1];i=i+1) { icount=1; prtline=lines[1,i]; for(j=2;j<=nfamily;j=j+1) { for(k=1;k<=numfam[j];k=k+1) { if(tail[j,k]==tail[1,i]) { icount=icount+1; prtline=prtline " " lines[j,k] } } } if(icount==nfamily) print prtline } }' $ORBITLIST.neworbits$$ | sh $USHbufr/ingest_process_orbits 9>> \ $ORBITLIST orberror=$? elif [ $CONCATCORREL = concatenate_families ] ; then sh $USHbufr/ingest_process_orbits < $ORBITLIST.neworbits$$ 9>> $ORBITLIST orberror=$? else set +x echo echo "Exiting - No action specified for multiple file families." echo "Ending time for ingest_process_onetype_neworbits is $(date)." echo exit 220 fi rm $ORBITLIST.neworbits$$ if [ $orberror -ne 0 ] ; then set +x echo echo "Ending time for ingest_process_onetype_neworbits is $(date)." echo exit $orberror fi set +x echo echo "Ending time for ingest_process_onetype_neworbits is $(date)." echo