From Astronomy Facility Wiki
ChaNGa (Charm N-body GrAvity solver) is a code to perform collisionless N-body simulations. It can perform cosmological simulations with periodic boundary conditions in comoving coordinates or simulations of isolated stellar systems. It also can include hydrodynamics using the Smooth Particle Hydrodynamics (SPH) technique. It uses a Barnes-Hut tree to calculate gravity, with hexadecapole expansion of nodes and Ewald summation for periodic forces. Timestepping is done with a leapfrog integrator with individual timesteps for each particle.
ChaNGa's novel feature is to use the dynamic load balancing scheme of the CHARM runtime system in order to obtain good performance on massively parallel systems. See our Supercomputing '06 poster for scaling results up to 20,000 processors.
If you choose to use ChaNGa for scientific work, please reference the code papers:
P. Jetley, F. Gioachin, C. Mendes, L. V. Kale, and T. R. Quinn. Mas- sively parallel cosmological simulations with ChaNGa. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, 2008.
P. Jetley, L. Wesolowski, F. Gioachin, L. V. Kale, and T. R. Quinn. Scaling hierarchical n-body simulations on gpu clusters. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, Washington, DC, USA, 2010. IEEE Computer Society.
ChaNGa uses the Tipsy file format for I/O.
First you need to install Charm. Version 6.5.0 or later is required for the version 3.0 release of ChaNGa. The latest development version of Charm is needed for the latest development version of ChaNGa.
ChaNGa itself can be downloaded from the software server. As well as the released version, nightly updates to the most recent development version is available. The most recent version of ChaNGa itself can be obtained from the git archive. If you install the development version, you will also need to install the development version of Charm.
You will need the changa module and the utility module in the git repository:
git clone http://charm.cs.illinois.edu/gerrit/cosmo/changa.git git clone http://charm.cs.illinois.edu/gerrit/cosmo/utility.git
The distribution of ChaNGa is licensed under the GPL.
Build Charm++ with ChaNGa libraries
Charm needs to be built first. If it is not build it with the
build script in the top level of the charm distribution. Typically, a command like
build charm++ net-linux will build the charm++ compiler. However,
ChaNGa uses a couple of Charm libraries that may not be built by default. To be sure that these are build the command
build ChaNGa net-linux-x86_64
(for example on a 64 bit linux cluster) should be used to build all these libraries.
Build ChaNGa itself
Now go into the
changa directory, set the environment variable
CHARM_DIR to point at the root of your charm distribution and run
configure. (Note that if the copy of charm is put into the "ChaNGa" directory, or the directory above, then the CHARM_DIR variable need not be set.) This will configure the Makefile in both the
make should then produce the
Configure ChaNGa with
./configure --enable-cooling=cosmo (for cooling with cosmological primordial abundances).
Machine specific instructions
For instructions on particular parallel architectures see Research:ChaNGaBuildInstructions.
Problems with Building
One common problem with MPI builds is that at the link stage, many messages of the form
machine.c:(.text+0xa5): undefined reference to `ompi_mpi_comm_world'
This indicates that the MPI libraries are not being found. Try adding a "-lmpi" to the LDLIBS variable in the Makefile:
LDLIBS += $(STRUCTURES_PATH)/libTipsy.a -lmpi
Testing and Performance
ChaNGa has been tested and benchmarked on a number of platforms. Benchmarks are posted under Research:ChaNGaBenchmarks.
Initial Conditions and Parameters
ChaNGa accepts Tipsy files as initial conditions. The running of the program is controlled by either a parameter file or command line switches, in the style of
PKDGRAV. See the
testcosmo or the
teststep subdirectories for example parameter files.
ChaNGa --help will give a list of all available options. Their meaning is described in Research:ChaNGaOptions. ChaNGa can be run in parallel or in serial. Generally (depending on the architecture) to run in parallel requires starting ChaNGa with the
charmrun program. For
charmrun +p4 ./ChaNGa cube300.param
ChaNGa on four processors using the
cube300.param parameter file.
Here is a more complicated example:
charmrun +p 4 ++local ./ChaNGa -wall 60 +balancer MultistepLB_notopo +LBPeriod 0.0 cube300.param
++local means run all processes locally and ignore the network.
-wall 60 means run for 60 minutes before checkpointing and stopping.
+balancer MultistepLB_notopo is specifying a load balancer, and
+LBPeriod 0.0 is specifying no wait time between successive load balancings.
The "net" version of charm starts multiple processes by invoking
ssh; therefore an ssh server needs to be installed on the target machine. For example, on Redhat/Fedora machines the
openssh-server package needs to be installed.
yum install openssh-server will accomplish this.
Ssh needs to be installed even if you are running multiple cores on a single node. Also by default,
ssh requires you to enter your password. This can be avoided by setting up your ssh keys correctly. See the SSH with keys HOWTO for information on how to do this.
The GPU version of ChaNGa offloads computation to the GPU in chunks called work requests (WR). The interaction of one bucket of particles with a node or another bucket of particles constitutes one unit of computation. Each WR can hold a certain, specified, number of force computations. An appropriate value for the WR size can be specified by the user.
There are several kinds of WR in ChaNGa. WRs that represent the computation between local buckets and local data (either nodes or other buckets) are referred to as 'local'. Similarly, WRs that specify computation of local buckets with remote prefetched data are termed 'remote'. Finally, WRs that specify interaction between local buckets and remote data that haven't been prefetched are termed 'remote-resume'.
ChaNGa provides the following parameters to assign a value for each type of WR:
- -localnodes: bucket - local node computations to offload per WR
- -localparts: bucket - local bucket computations
- -remotenodes: bucket - remote node computations to offload per WR
- -remoteparts: bucket - remote bucket computations to offload
- -remoteresumenodes: bucket - remote-resume node computations to offload per WR
- -remoteresumeparts: bucket - remote-resume bucket computations to offload
Values for these parameters affect the efficiency of kernel execution and the total execution time. For instance, if a WR size is set too high, there is less overlap between work done on the CPU with that done on the CPU. On the other hand values that are too small increase the transfer and kernel invocation overheads associated with each WR.
Appropriate values can be obtained by the following mechanism:
- Recompile the ChaNGa CUDA version with -DCUDA_STATS in addition to the other CUDA-specific flags.
- This gives the per-iteration count of each type of interaction (localnodes, localparts, remotenodes, remoteparts, remoteresumenodes, remoteresumeparts).
- These values can be used to split the total number of interactions into as many pieces (WRs) as deemed appropriate. Some effort might be required to determine appropriate values in this fashion.
The default value for every parameter is 0.1 million.
On MPI architectures, you have the option of building the MPI version of charm, and then
charmrun is just a shell script wrapper around whatever command is used to start MPI jobs (e.g
poe on IBM,
mpirun on mpich.) A typical launch command for an MPI job would be
mpiexec ./ChaNGa -wall 600 +balancer MultistepLB_notopo simulation.param
where 600 refers to the minutes of wallclock time requested from the queuing system and MultistepLB_notopo is the specified load balancer.
Another option on many infiniband clusters is to use the native infiniband support. See Research:ChaNGaBuildInstructions#Infiniband Linux cluster (lonestar, stampede at TACC; gordon at SDSC; Plieades at NAS) instructions for details.
Outputs are also in TIPSY format and are in files that end with the timestep. For example to visualize the final output of the testcosmo simulation, fire up
tipsy, and type
openbinary cube300.000128 loadstandard 1.0 zall
This should display the clustering of galaxies on a 300 Mpc scale.
A simulation can be restarted from a checkpoint using the syntax:
charmrun +p4 ./ChaNGa +restart cube300.chk0
All parameters will be restored from the checkpoint directory. Only a small subset of the run parameters can be changed in a restart, and only by specifying the changes via command line arguments. These include the base timestep (-dt), the number of timesteps (-n), the wall clock time limit (-wall), the particles/bucket (-b), the output interval (-oi), and the checkpoint interval (-oc).
ChaNGa now (as of 7/2009) has on demand visualization capabilities via the liveViz module of CHARM++. To use it, set
bLiveViz = 1 in the parameter file, and start ChaNGa with
charmrun +p4 ++server ++server-port NNNNN ./ChaNGa run.param
where NNNNN is an unused TCP port number. Images of the running simulation can be optained by using the liveViz java client from the CHARM++ distribution in java/bin/liveViz. The syntax is
liveViz hostname NNNNN where
hostname is the machine on which charmrun is running, and NNNNN is the port number given above. A window will pop up with an image that will continually be refreshed from the running program. The image view is controlled by the .director file. See Research:ChaNGaOptions#Movie_Making_options.
See Research:ChaNGaPerformanceAnalysis for tools to measure and improve the performance of ChaNGa.
Bugs and feature requests can be submitted to the NChilada product of our bugzilla server. Because of Spammers, a password is needed; ask for it on the mailing list.
Also check out our list of Research:ChaNGa Issues for common errors when running ChaNGa.
Internal code documentation using doxygen is partially done.
While there is no comprehensive body of documentation detailing the ChaNGa code, the recent refactoring efforts are outlined and discussed here. The refactoring process unearthed the answers to some nuances of the existing code as well, so one would do well to look through these articles.
The development of ChaNGa was supported by a National Science Foundation ITR grant PHY-0205413 to the University of Washington, and NSF ITR grant NSF-0205611 to the University of Illinois. Contributors to the program include Graeme Lufkin, Tom Quinn, Rok Roskar, Filippo Gioachin, Sayantan Chakravorty, Amit Sharma, Pritish Jetley, Lukasz Wesolowski, Edgar Solomonik, Celso Mendes, Joachim Stadel, and James Wadsley.