Research:ChaNGa

From Astronomy Facility Wiki

Jump to: navigation, search

ChaNGa (Charm N-body GrAvity solver) is a code to perform collisionless N-body simulations. It can perform cosmological simulations with periodic boundary conditions in comoving coordinates or simulations of isolated stellar systems. It also can include hydrodynamics using the Smooth Particle Hydrodynamics (SPH) technique. It uses a Barnes-Hut tree to calculate gravity, with hexadecapole expansion of nodes and Ewald summation for periodic forces. Timestepping is done with a leapfrog integrator with individual timesteps for each particle.

ChaNGa's novel feature is to use the dynamic load balancing scheme of the CHARM runtime system in order to obtain good performance on massively parallel systems. See our Supercomputing '06 poster for scaling results up to 20,000 processors.

ChaNGa uses the Tipsy file format for I/O.

Contents

Obtaining ChaNGa

First you need to install Charm. Version 6.5.0 or later is required for the version 3.0 release of ChaNGa. The latest development version of Charm is needed for the latest development version of ChaNGa.

ChaNGa itself can be downloaded from the software server. As well as the released version, nightly updates to the most recent development version is available. The most recent version of ChaNGa itself can be obtained from the git archive. If you install the development version, you will also need to install the development version of Charm.

You will need the changa module and the utility module in the git repository:

git clone http://charm.cs.illinois.edu/gerrit/cosmo/changa.git
git clone http://charm.cs.illinois.edu/gerrit/cosmo/utility.git

License

The distribution of ChaNGa is licensed under the GPL.

Building ChaNGa

Build Charm++ with ChaNGa libraries

Charm needs to be built first. If it is not build it with the build script in the top level of the charm distribution. Typically, a command like build charm++ net-linux will build the charm++ compiler. However, ChaNGa uses a couple of Charm libraries that may not be built by default. To be sure that these are build the command

build ChaNGa net-linux-x86_64

(for example on a 64 bit linux cluster) should be used to build all these libraries.

Build ChaNGa itself

Now go into the changa directory, set the environment variable CHARM_DIR to point at the root of your charm distribution and run configure. (Note that if the copy of charm is put into the "ChaNGa" directory, or the directory above, then the CHARM_DIR variable need not be set.) This will configure the Makefile in both the changa and structures directory. make should then produce the charmrun and ChaNGa executables.

Gas Cooling

Configure ChaNGa with ./configure --enable-cooling=cosmo (for cooling with cosmological primordial abundances).

Machine specific instructions

For instructions on particular parallel architectures see Research:ChaNGaBuildInstructions.

Problems with Building

One common problem with MPI builds is that at the link stage, many messages of the form

machine.c:(.text+0xa5): undefined reference to `ompi_mpi_comm_world'

are printed.

This indicates that the MPI libraries are not being found. Try adding a "-lmpi" to the LDLIBS variable in the Makefile:

LDLIBS += $(STRUCTURES_PATH)/libTipsy.a -lmpi

Testing and Performance

ChaNGa has been tested and benchmarked on a number of platforms. Benchmarks are posted under Research:ChaNGaBenchmarks.

Running ChanNGa

Initial Conditions and Parameters

ChaNGa accepts Tipsy files as initial conditions. The running of the program is controlled by either a parameter file or command line switches, in the style of PKDGRAV. See the testcosmo or the teststep subdirectories for example parameter files. ChaNGa --help will give a list of all available options. Their meaning is described in Research:ChaNGaOptions. ChaNGa can be run in parallel or in serial. Generally (depending on the architecture) to run in parallel requires starting ChaNGa with the charmrun program. For example

charmrun +p4 ./ChaNGa cube300.param

will start ChaNGa on four processors using the cube300.param parameter file.

Here is a more complicated example:

charmrun +p 4 ++local ./ChaNGa -wall 60 +balancer MultistepLB_notopo +LBPeriod 0.0 cube300.param

++local means run all processes locally and ignore the network. -wall 60 means run for 60 minutes before checkpointing and stopping. +balancer MultistepLB_notopo is specifying a load balancer, and +LBPeriod 0.0 is specifying no wait time between successive load balancings.

net-linux Architectures

The "net" version of charm starts multiple processes by invoking ssh; therefore an ssh server needs to be installed on the target machine. For example, on Redhat/Fedora machines the openssh-server package needs to be installed. yum install openssh-server will accomplish this. Ssh needs to be installed even if you are running multiple cores on a single node. Also by default, ssh requires you to enter your password. This can be avoided by setting up your ssh keys correctly. See the SSH with keys HOWTO for information on how to do this.

net-linux-x86_64-cuda

The GPU version of ChaNGa offloads computation to the GPU in chunks called work requests (WR). The interaction of one bucket of particles with a node or another bucket of particles constitutes one unit of computation. Each WR can hold a certain, specified, number of force computations. An appropriate value for the WR size can be specified by the user.

There are several kinds of WR in ChaNGa. WRs that represent the computation between local buckets and local data (either nodes or other buckets) are referred to as 'local'. Similarly, WRs that specify computation of local buckets with remote prefetched data are termed 'remote'. Finally, WRs that specify interaction between local buckets and remote data that haven't been prefetched are termed 'remote-resume'.

ChaNGa provides the following parameters to assign a value for each type of WR:

Local WRs:

  • -localnodes: bucket - local node computations to offload per WR
  • -localparts: bucket - local bucket computations

Remote WRs:

  • -remotenodes: bucket - remote node computations to offload per WR
  • -remoteparts: bucket - remote bucket computations to offload

Remote-Resume WRs:

  • -remoteresumenodes: bucket - remote-resume node computations to offload per WR
  • -remoteresumeparts: bucket - remote-resume bucket computations to offload

Values for these parameters affect the efficiency of kernel execution and the total execution time. For instance, if a WR size is set too high, there is less overlap between work done on the CPU with that done on the CPU. On the other hand values that are too small increase the transfer and kernel invocation overheads associated with each WR.

Appropriate values can be obtained by the following mechanism:

  • Recompile the ChaNGa CUDA version with -DCUDA_STATS in addition to the other CUDA-specific flags.
  • This gives the per-iteration count of each type of interaction (localnodes, localparts, remotenodes, remoteparts, remoteresumenodes, remoteresumeparts).
  • These values can be used to split the total number of interactions into as many pieces (WRs) as deemed appropriate. Some effort might be required to determine appropriate values in this fashion.

The default value for every parameter is 0.1 million.

MPI Architectures

On MPI architectures, you have the option of building the MPI version of charm, and then charmrun is just a shell script wrapper around whatever command is used to start MPI jobs (e.g poe on IBM, mpirun on mpich.) A typical launch command for an MPI job would be

mpiexec ./ChaNGa -wall 600 +balancer MultistepLB_notopo simulation.param

where 600 refers to the minutes of wallclock time requested from the queuing system and MultistepLB_notopo is the specified load balancer.

Another option on many infiniband clusters is to use the native infiniband support. See Research:ChaNGaBuildInstructions#Infiniband Linux cluster (lonestar, stampede at TACC; gordon at SDSC; Plieades at NAS) instructions for details.

See appendix C of the CHARM language manual for more information on parallel execution. Also see Research:ChaNGaPerformanceAnalysis to evaluate how these options affect the parallel performance.

ChaNGa Output

Outputs are also in TIPSY format and are in files that end with the timestep. For example to visualize the final output of the testcosmo simulation, fire up tipsy, and type

openbinary cube300.000128
loadstandard 1.0
zall

This should display the clustering of galaxies on a 300 Mpc scale.

Restarts

A simulation can be restarted from a checkpoint using the syntax:

charmrun +p4 ./ChaNGa +restart cube300.chk0

All parameters will be restored from the checkpoint directory. Only a small subset of the run parameters can be changed in a restart, and only by specifying the changes via command line arguments. These include the base timestep (-dt), the number of timesteps (-n), the wall clock time limit (-wall), the particles/bucket (-b), the output interval (-oi), and the checkpoint interval (-oc).

Visualization

ChaNGa now (as of 7/2009) has on demand visualization capabilities via the liveViz module of CHARM++. To use it, set bLiveViz = 1 in the parameter file, and start ChaNGa with

charmrun +p4 ++server ++server-port NNNNN ./ChaNGa run.param

where NNNNN is an unused TCP port number. Images of the running simulation can be optained by using the liveViz java client from the CHARM++ distribution in java/bin/liveViz. The syntax is liveViz hostname NNNNN where hostname is the machine on which charmrun is running, and NNNNN is the port number given above. A window will pop up with an image that will continually be refreshed from the running program. The image view is controlled by the .director file. See Research:ChaNGaOptions#Movie_Making_options.

Improving Performance

See Research:ChaNGaPerformanceAnalysis for tools to measure and improve the performance of ChaNGa.

Getting Help

An email list has been set up at changa-users at u.washington.edu. Please subscribe to the list before posting to it.

Bugs and feature requests can be submitted to the NChilada product of our bugzilla server. Because of Spammers, a password is needed; ask for it on the mailing list.

Also check out our list of Research:ChaNGa Issues for common errors when running ChaNGa.

Documentation

Internal code documentation using doxygen is partially done.

While there is no comprehensive body of documentation detailing the ChaNGa code, the recent refactoring efforts are outlined and discussed here. The refactoring process unearthed the answers to some nuances of the existing code as well, so one would do well to look through these articles.

Acknowledgements

The development of ChaNGa was supported by a National Science Foundation ITR grant PHY-0205413 to the University of Washington, and NSF ITR grant NSF-0205611 to the University of Illinois. Contributors to the program include Graeme Lufkin, Tom Quinn, Rok Roskar, Filippo Gioachin, Sayantan Chakravorty, Amit Sharma, Pritish Jetley, Lukasz Wesolowski, Edgar Solomonik, Celso Mendes, Joachim Stadel, and James Wadsley.

Personal tools