!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! !!
!! GNU General Public License !!
!! !!
!! This file is part of the Flexible Modeling System (FMS). !!
!! !!
!! FMS is free software; you can redistribute it and/or modify !!
!! it and are expected to follow the terms of the GNU General Public !!
!! License as published by the Free Software Foundation. !!
!! !!
!! FMS is distributed in the hope that it will be useful, !!
!! but WITHOUT ANY WARRANTY; without even the implied warranty of !!
!! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the !!
!! GNU General Public License for more details. !!
!! !!
!! You should have received a copy of the GNU General Public License !!
!! along with FMS; if not, write to: !!
!! Free Software Foundation, Inc. !!
!! 59 Temple Place, Suite 330 !!
!! Boston, MA 02111-1307 USA !!
!! or see: !!
!! http://www.gnu.org/licenses/gpl.txt !!
!! !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!-----------------------------------------------------------------------
! Communication for message-passing codes
!
! AUTHOR: V. Balaji (V.Balaji@noaa.gov)
! SGI/GFDL Princeton University
!
! This program is free software; you can redistribute it and/or modify
! it under the terms of the GNU General Public License as published by
! the Free Software Foundation; either version 2 of the License, or
! (at your option) any later version.
!
! This program is distributed in the hope that it will be useful,
! but WITHOUT ANY WARRANTY; without even the implied warranty of
! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
! GNU General Public License for more details.
!
! For the full text of the GNU General Public License,
! write to: Free Software Foundation, Inc.,
! 675 Mass Ave, Cambridge, MA 02139, USA.
!-----------------------------------------------------------------------
module mpp_mod
!a generalized communication package for use with shmem and MPI
!will add: co_array_fortran, MPI2
!Balaji (V.Balaji@noaa.gov) 11 May 1998
!
! V. Balaji
!
!
!
!
! mpp_mod, is a set of simple calls to provide a uniform interface
! to different message-passing libraries. It currently can be
! implemented either in the SGI/Cray native SHMEM library or in the MPI
! standard. Other libraries (e.g MPI-2, Co-Array Fortran) can be
! incorporated as the need arises.
!
!
! The data transfer between a processor and its own memory is based
! on load and store operations upon
! memory. Shared-memory systems (including distributed shared memory
! systems) have a single address space and any processor can acquire any
! data within the memory by load and
! store. The situation is different for distributed
! parallel systems. Specialized MPP systems such as the T3E can simulate
! shared-memory by direct data acquisition from remote memory. But if
! the parallel code is distributed across a cluster, or across the Net,
! messages must be sent and received using the protocols for
! long-distance communication, such as TCP/IP. This requires a
! ``handshaking'' between nodes of the distributed system. One can think
! of the two different methods as involving puts or
! gets (e.g the SHMEM library), or in the case of
! negotiated communication (e.g MPI), sends and
! recvs.
!
! The difference between SHMEM and MPI is that SHMEM uses one-sided
! communication, which can have very low-latency high-bandwidth
! implementations on tightly coupled systems. MPI is a standard
! developed for distributed computing across loosely-coupled systems,
! and therefore incurs a software penalty for negotiating the
! communication. It is however an open industry standard whereas SHMEM
! is a proprietary interface. Besides, the puts or
! gets on which it is based cannot currently be implemented in
! a cluster environment (there are recent announcements from Compaq that
! occasion hope).
!
! The message-passing requirements of climate and weather codes can be
! reduced to a fairly simple minimal set, which is easily implemented in
! any message-passing API. mpp_mod provides this API.
!
! Features of mpp_mod include:
!
! 1) Simple, minimal API, with free access to underlying API for
! more complicated stuff.
! 2) Design toward typical use in climate/weather CFD codes.
! 3) Performance to be not significantly lower than any native API.
!
! This module is used to develop higher-level calls for domain decomposition and parallel I/O.
!
! Parallel computing is initially daunting, but it soon becomes
! second nature, much the way many of us can now write vector code
! without much effort. The key insight required while reading and
! writing parallel code is in arriving at a mental grasp of several
! independent parallel execution streams through the same code (the SPMD
! model). Each variable you examine may have different values for each
! stream, the processor ID being an obvious example. Subroutines and
! function calls are particularly subtle, since it is not always obvious
! from looking at a call what synchronization between execution streams
! it implies. An example of erroneous code would be a global barrier
! call (see mpp_sync below) placed
! within a code block that not all PEs will execute, e.g:
!
!
! if( pe.EQ.0 )call mpp_sync()
!
!
! Here only PE 0 reaches the barrier, where it will wait
! indefinitely. While this is a particularly egregious example to
! illustrate the coding flaw, more subtle versions of the same are
! among the most common errors in parallel code.
!
! It is therefore important to be conscious of the context of a
! subroutine or function call, and the implied synchronization. There
! are certain calls here (e.g mpp_declare_pelist, mpp_init,
! mpp_malloc, mpp_set_stack_size) which must be called by all
! PEs. There are others which must be called by a subset of PEs (here
! called a pelist) which must be called by all the PEs in the
! pelist (e.g mpp_max, mpp_sum, mpp_sync). Still
! others imply no synchronization at all. I will make every effort to
! highlight the context of each call in the MPP modules, so that the
! implicit synchronization is spelt out.
!
! For performance it is necessary to keep synchronization as limited
! as the algorithm being implemented will allow. For instance, a single
! message between two PEs should only imply synchronization across the
! PEs in question. A global synchronization (or barrier)
! is likely to be slow, and is best avoided. But codes first
! parallelized on a Cray T3E tend to have many global syncs, as very
! fast barriers were implemented there in hardware.
!
! Another reason to use pelists is to run a single program in MPMD
! mode, where different PE subsets work on different portions of the
! code. A typical example is to assign an ocean model and atmosphere
! model to different PE subsets, and couple them concurrently instead of
! running them serially. The MPP module provides the notion of a
! current pelist, which is set when a group of PEs branch off
! into a subset. Subsequent calls that omit the pelist optional
! argument (seen below in many of the individual calls) assume that the
! implied synchronization is across the current pelist. The calls
! mpp_root_pe and mpp_npes also return the values
! appropriate to the current pelist. The mpp_set_current_pelist
! call is provided to set the current pelist.
!
!
! F90 is a strictly-typed language, and the syntax pass of the
! compiler requires matching of type, kind and rank (TKR). Most calls
! listed here use a generic type, shown here as MPP_TYPE_. This
! is resolved in the pre-processor stage to any of a variety of
! types. In general the MPP operations work on 4-byte and 8-byte
! variants of integer, real, complex, logical variables, of
! rank 0 to 5, leading to 48 specific module procedures under the same
! generic interface. Any of the variables below shown as
! MPP_TYPE_ is treated in this way.
!
use mpp_parameter_mod, only : MPP_VERBOSE, MPP_DEBUG, ALL_PES, ANY_PE, NULL_PE
use mpp_parameter_mod, only : NOTE, WARNING, FATAL, MPP_CLOCK_DETAILED,MPP_CLOCK_SYNC
use mpp_parameter_mod, only : CLOCK_COMPONENT, CLOCK_SUBCOMPONENT, CLOCK_MODULE_DRIVER
use mpp_parameter_mod, only : CLOCK_MODULE, CLOCK_ROUTINE, CLOCK_LOOP, CLOCK_INFRA
use mpp_data_mod, only : request, mpp_record_timing_data
use mpp_comm_mod, only : mpp_chksum, mpp_max, mpp_min, mpp_sum, mpp_transmit
use mpp_comm_mod, only : mpp_send, mpp_recv, mpp_broadcast, mpp_malloc
use mpp_comm_mod, only : mpp_init, mpp_exit, mpp_set_stack_size
#ifdef use_MPI_GSM
use mpp_comm_mod, only : mpp_gsm_malloc, mpp_gsm_free
#endif
use mpp_util_mod, only : stdin, stdout, stderr, stdlog, lowercase, uppercase
use mpp_util_mod, only : mpp_error, mpp_error_state, mpp_set_warn_level, mpp_sync
use mpp_util_mod, only : mpp_sync_self, mpp_pe, mpp_node, mpp_npes, mpp_root_pe
use mpp_util_mod, only : mpp_set_root_pe, mpp_declare_pelist, mpp_get_current_pelist
use mpp_util_mod, only : mpp_set_current_pelist, mpp_clock_begin, mpp_clock_end
use mpp_util_mod, only : mpp_clock_id, mpp_clock_set_grain
character(len=128), public :: version= &
'$Id mpp.F90 $'
character(len=128), public :: tagname= &
'$Name$'
!--- public paramters -----------------------------------------------
public :: MPP_VERBOSE, MPP_DEBUG, ALL_PES, ANY_PE, NULL_PE, NOTE, WARNING, FATAL
public :: MPP_CLOCK_SYNC, MPP_CLOCK_DETAILED, CLOCK_COMPONENT, CLOCK_SUBCOMPONENT
public :: CLOCK_MODULE_DRIVER, CLOCK_MODULE, CLOCK_ROUTINE, CLOCK_LOOP, CLOCK_INFRA
!--- public data from mpp_data_mod ------------------------------
public :: request
!--- public interface from mpp_util_mod ------------------------------
public :: stdin, stdout, stderr, stdlog, lowercase, uppercase, mpp_error, mpp_error_state
public :: mpp_set_warn_level, mpp_sync, mpp_sync_self, mpp_set_stack_size, mpp_pe
public :: mpp_node, mpp_npes, mpp_root_pe, mpp_set_root_pe, mpp_declare_pelist
public :: mpp_get_current_pelist, mpp_set_current_pelist, mpp_clock_begin, mpp_clock_end
public :: mpp_clock_id, mpp_clock_set_grain, mpp_record_timing_data
!--- public interface from mpp_comm_mod ------------------------------
public :: mpp_chksum, mpp_max, mpp_min, mpp_sum, mpp_transmit, mpp_send, mpp_recv
public :: mpp_broadcast, mpp_malloc, mpp_init, mpp_exit
#ifdef use_MPI_GSM
public :: mpp_gsm_malloc, mpp_gsm_free
#endif
end module mpp_mod
#ifdef test_mpp
#ifdef SYSTEM_CLOCK
#undef SYSTEM_CLOCK
#endif
program test !test various aspects of mpp_mod
#include
#ifdef sgi_mipspro
use shmem_interface
#endif
use mpp_mod, only : mpp_init, mpp_exit, mpp_pe, mpp_npes, mpp_root_pe, stdout
use mpp_mod, only : mpp_clock_id, mpp_clock_begin, mpp_clock_end, mpp_sync, mpp_malloc
use mpp_mod, only : mpp_declare_pelist, mpp_set_current_pelist, mpp_set_stack_size
use mpp_mod, only : mpp_broadcast, mpp_transmit, mpp_sum, mpp_max, mpp_chksum, ALL_PES
#ifdef use_MPI_GSM
use mpp_mod, only : mpp_gsm_malloc, mpp_gsm_free
#endif
implicit none
integer, parameter :: n=1048576
real, allocatable, dimension(:) :: a, b, c
#ifdef use_MPI_GSM
real :: d(n)
pointer (locd, d)
#else
real, allocatable, dimension(:) :: d
integer(LONG_KIND) :: locd
#endif
integer :: tick, tick0, ticks_per_sec, id
integer :: pe, npes, root, i, j, k, l, m, n2, istat
real :: dt
call mpp_init()
call mpp_set_stack_size(3145746)
pe = mpp_pe()
npes = mpp_npes()
root = mpp_root_pe()
call SYSTEM_CLOCK( count_rate=ticks_per_sec )
allocate( a(n), b(n) )
id = mpp_clock_id( 'Random number' )
call mpp_clock_begin(id)
call random_number(a)
call mpp_clock_end (id)
!---------------------------------------------------------------------!
! time transmit, compare against shmem_put and get !
!---------------------------------------------------------------------!
if( pe.EQ.root )then
print *, 'Time mpp_transmit for various lengths...'
#ifdef SGICRAY
print *, 'For comparison, times for shmem_get and shmem_put are also provided.'
#endif
print *
end if
id = mpp_clock_id( 'mpp_transmit' )
call mpp_clock_begin(id)
!timing is done for cyclical pass (more useful than ping-pong etc)
l = n
do while( l.GT.0 )
!--- mpp_transmit -------------------------------------------------
call mpp_sync()
call SYSTEM_CLOCK(tick0)
do i = 1,npes
call mpp_transmit( put_data=a(1), plen=l, to_pe=modulo(pe+npes-i,npes), &
get_data=b(1), glen=l, from_pe=modulo(pe+i,npes) )
! call mpp_sync_self( (/modulo(pe+npes-i,npes)/) )
end do
call mpp_sync()
call SYSTEM_CLOCK(tick)
dt = real(tick-tick0)/(npes*ticks_per_sec)
dt = max( dt, epsilon(dt) )
if( pe.EQ.root )write( stdout(),'(/a,i8,f13.6,f8.2)' )'MPP_TRANSMIT length, time, bw(Mb/s)=', l, dt, l*8e-6/dt
!#ifdef SGICRAY
! !--- shmem_put ----------------------------------------------------
! call mpp_sync()
! call SYSTEM_CLOCK(tick0)
! do i = 1,npes
! call shmem_real_put( b, a, l, modulo(pe+1,npes) )
! end do
! call mpp_sync()
! call SYSTEM_CLOCK(tick)
! dt = real(tick-tick0)/(npes*ticks_per_sec)
! dt = max( dt, epsilon(dt) )
! if( pe.EQ.root )write( stdout(),'( a,i8,f13.6,f8.2)' )'SHMEM_PUT length, time, bw(Mb/s)=', l, dt, l*8e-6/dt
! !--- shmem_get ----------------------------------------------------
! call mpp_sync()
! call SYSTEM_CLOCK(tick0)
! do i = 1,npes
! call shmem_real_get( b, a, l, modulo(pe+1,npes) )
! end do
! call SYSTEM_CLOCK(tick)
! dt = real(tick-tick0)/(npes*ticks_per_sec)
! dt = max( dt, epsilon(dt) )
! if( pe.EQ.root )write( stdout(),'( a,i8,f13.6,f8.2)' )'SHMEM_GET length, time, bw(Mb/s)=', l, dt, l*8e-6/dt
!#endif
l = l/2
end do
!---------------------------------------------------------------------!
! test mpp_sum !
!---------------------------------------------------------------------!
if( pe.EQ.root )then
print '(/a)', 'Time mpp_sum...'
end if
a = real(pe+1)
call mpp_sync()
call SYSTEM_CLOCK(tick0)
call mpp_sum(a(1:1000),1000)
call SYSTEM_CLOCK(tick)
dt = real(tick-tick0)/ticks_per_sec
dt = max( dt, epsilon(dt) )
if( pe.EQ.root )write( stdout(),'(a,2i4,f9.1,i8,f13.6,f8.2/)' ) &
'mpp_sum: pe, npes, sum(pe+1), length, time, bw(Mb/s)=', pe, npes, a(1), n, dt, n*8e-6/dt
call mpp_clock_end(id)
!---------------------------------------------------------------------!
! test mpp_max !
!---------------------------------------------------------------------!
if( pe.EQ.root )then
print *
print *, 'Test mpp_max...'
end if
a = real(pe+1)
print *, 'pe, pe+1 =', pe, a(1)
call mpp_max( a(1) )
print *, 'pe, max(pe+1)=', pe, a(1)
!pelist check
call mpp_sync()
call flush(stdout(),istat)
if( npes.GE.2 )then
if( pe.EQ.root )print *, 'Test of pelists: bcast, sum and max using PEs 0...npes-2 (excluding last PE)'
call mpp_declare_pelist( (/(i,i=0,npes-2)/) )
a = real(pe+1)
if( pe.NE.npes-1 )call mpp_broadcast( a, n, npes-2, (/(i,i=0,npes-2)/) )
print *, 'bcast(npes-1) from 0 to npes-2=', pe, a(1)
a = real(pe+1)
if( pe.NE.npes-1 )then
call mpp_set_current_pelist( (/(i,i=0,npes-2)/) )
id = mpp_clock_id( 'Partial mpp_sum' )
call mpp_clock_begin(id)
call mpp_sum( a(1:1000), 1000, (/(i,i=0,npes-2)/) )
call mpp_clock_end (id)
end if
if( pe.EQ.root )print *, 'sum(pe+1) from 0 to npes-2=', a(1)
a = real(pe+1)
if( pe.NE.npes-1 )call mpp_max( a(1), (/(i,i=0,npes-2)/) )
if( pe.EQ.root )print *, 'max(pe+1) from 0 to npes-2=', a(1)
end if
call mpp_set_current_pelist()
#ifdef use_CRI_pointers
!---------------------------------------------------------------------!
! test mpp_chksum !
!---------------------------------------------------------------------!
if( modulo(n,npes).EQ.0 )then !only set up for even division
n2 = 1024
a = 0.d0
if( pe.EQ.root )call random_number(a(1:n2))
! if( pe.EQ.root )call random_number(a)
call mpp_sync()
call mpp_transmit( put_data=a(1), plen=n2, to_pe=ALL_PES, &
get_data=a(1), glen=n2, from_pe=root )
! call mpp_transmit( put_data=a(1), plen=n, to_pe=ALL_PES, &
! get_data=a(1), glen=n, from_pe=root )
m= n2/npes
! m= n/npes
allocate( c(m) )
c = a(pe*m+1:pe*m+m)
if( pe.EQ.root )then
print *
print *, 'Test mpp_chksum...'
print *, 'This test shows that a whole array and a distributed array give identical checksums.'
end if
print *, 'chksum(a(1:1024))=', mpp_chksum(a(1:n2),(/pe/))
print *, 'chksum(c(1:1024))=', mpp_chksum(c)
! print *, 'chksum(a)=', mpp_chksum(a,(/pe/))
! print *, 'chksum(c)=', mpp_chksum(c)
end if
!test of pointer sharing
#ifdef use_MPI_GSM
call mpp_gsm_malloc( locd, sizeof(d) )
#else
if( pe.EQ.root )then
allocate( d(n) )
locd = LOC(d)
end if
call mpp_broadcast(locd,root)
#endif
if( pe.EQ.root )then
call random_number(d)
end if
call mpp_sync()
call test_shared_pointers(locd,n)
#ifdef use_MPI_GSM
call mpp_gsm_free( locd )
#else
if( pe.EQ.root )then
deallocate( d )
end if
#endif
#endif
call mpp_exit()
contains
subroutine test_shared_pointers(locd,n)
integer(LONG_KIND), intent(in) :: locd
integer :: n
real :: dd(n)
pointer( p, dd )
p = locd
print *, 'TEST_SHARED_POINTERS: pe, locd=', pe, locd
print *, 'TEST_SHARED_POINTERS: pe, chksum(d)=', pe, mpp_chksum(dd,(/pe/))
return
end subroutine test_shared_pointers
end program test
#endif test_mpp