AstroBEAR Virtual Memory

AstroBEAR uses huge virtual memory (comparing with data and text memory) when running with mulpi-processors:

One Processor:

http://www.pas.rochester.edu/~bliu/Jan_24_2012/AstroBEAR/bear_1n1p1t.png

Four Processor: http://www.pas.rochester.edu/~bliu/Jan_24_2012/AstroBEAR/bear_2n8p4t.png

To understand the problem, I tried a very simple Hello World program. Here are the results from TotalView:

One Processor: http://www.pas.rochester.edu/~bliu/Jan_24_2012/HelloWorld/1n1p1t.png

Four Processor: http://www.pas.rochester.edu/~bliu/Jan_24_2012/HelloWorld/1n4p4t.png

It's fair to say that the big virtual memory issue is not related to the AstroBEAR code. It's more related to openMPI and the system. I saw online resources arguing Virtual Memory includes memory for shared libraries which depends on other processes running. It makes sense to me. Especially I ran the Hello World program with same setup but at different times and found out it's using different virtual memories

http://www.pas.rochester.edu/~bliu/Jan_24_2012/HelloWorld/1n1p1t_2ndRun.png http://www.pas.rochester.edu/~bliu/Jan_24_2012/HelloWorld/1n1p1t_3run.png

I'm reading more on virtual memory and shared libraries.

Comments

1. Jonathan -- 13 years ago

Nothing running ⇒ 1.4 GB scratch space ⇒ 2 GB 3D 4 processor runs with 2 additional levels of AMR

resolution virtual resident %mem total_mem_diff file_size ratio
1283 1140MB 946MB 6.0% 3.8 GB 273 MB 14
643 363MB 195MB 1.2% .7 GB 38 MB 18
323 245MB 70 MB .4% .2 GB 6 MB 33
2. Jonathan -- 13 years ago

The virtual, resident, and percent are per processor. And the swap space is 2 GB ( not the scratch space)

3. Jonathan -- 13 years ago

The amount of used memory to file size tends to decrease with increasing data size with a rough linear fit of

Y=14X + 116

Given that the code itself is 25 mb or so, 4 processors would use 100 mb for the code which could explain the offset. The same run on 8 processors would likely use 100 mb more.

But the factor of 14 is higher than might be expected…

4. Jonathan -- 13 years ago

Also - submitted a job with out specifying vmem or pvmem and got 2 GB of virtual memory for each process… So when one of the processors on the 2 processor run hit 2GB the run died with 'insufficient virtual memory'

resolution→ 323+2 643+2 1283+2
1 .13 .63 3.0
2 .21 .7 3.2
4 .31 .78 3.8
8 .38 .84 4.6
file size 7.7 MB 38 MB 273 MB
  • x = number of processors
  • y = file size
  • z = total memory used

Fit and got

A 75.1 MB
B 12.3

so there is a factor of 12.3 between file size and memory use as well as a 75 MB overhead per processor probably for the code (25 MB+), message buffers or overhead with grid fractionation etc…

5. Jonathan -- 13 years ago

Also ran a 5123+2 simulation on 8 processors to try and break the memory… Job was terminated with vmem being exceeded and a nice error message… Not sure what happens if i request two nodes, 32 GB of ram and pvmem=-1… monitor might only step in if the combined memory usage exceeds 32 GB while a single node would likely first go over the 16 GB limit?

6. Jonathan -- 13 years ago

So I requested 2 nodes with 32 GB of virtual memory and set pvmem=-1. I then ran an 8 processor job on one of the nodes that used just under 16 GB. It seemed to be ok. I then increased the amount of memory to 16+ GB so that it would start using swap space. I noticed the CPU usage drop to 3% instead of the 99% when not using swap space. I then increased the size to be 18 GB+ and the run basically seized up (as well as top and the terminal etc…) once the swap was used up. After 5 minutes or so, the terminal came back…

Also tried running a single processor version - but that quickly died with 'insufficient virtual memory' even though vlimit was unlimited. Running a two processor version however did not immediately die with 'insufficient virtual memory' even though both processors were on the same node

7. Jonathan -- 13 years ago

Once the physical and swap memory is used up the code locks up and eventually is killed by the operating system with signal 9…

-bash-3.2$ mpirun -n 8 ./astrobear
========================== MPI ID     6 ================================
========================== MPI ID     2 ================================
========================== MPI ID     5 ================================
========================== MPI ID     3 ================================
========================== MPI ID     4 ================================
========================== MPI ID     1 ================================
========================== MPI ID     7 ================================
========================== MPI ID     0 ================================
   level-2   ambc= 0 0   gmbc= 4 0  egmbc= 4 0   ombc= 4 0   nmbc= 4   pmbc= 0
   level-1   ambc= 0 0   gmbc= 4 0  egmbc= 4 0   ombc= 4 0   nmbc= 4   pmbc= 4
   level 0   ambc= 0 0   gmbc= 4 0  egmbc= 4 0   ombc= 4 0   nmbc= 4   pmbc= 4
   level 1   ambc= 4 0   gmbc= 8 4  egmbc= 8 4   ombc= 8 4   nmbc= 4   pmbc= 4
   level 2   ambc= 4 0   gmbc= 8 4  egmbc= 8 4   ombc= 8 4   nmbc= 4   pmbc= 8
 Found existing profile.data file.
 Initializing Grids on level           0
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 2003 on node bh002 exited on signal 9 (Killed).
--------------------------------------------------------------------------
-bash-3.2$

Also noticed that for a 4 processor run each processor was using 4.0 GB of resident, and 7 GB of virtual memory. (Obviously 7 * 4 = 28 GB > 16+2) so virtual memory must be double counted somehow?

8. Baowei Liu -- 13 years ago

I wrote a script to monitor the memory usage of a job. — A few things:

  1. The size of virtual memory is the maximum size of the computer can address: for 32 bit, it's 4GB
  1. Virtual Memory in each process includes 6 sections: environment, stack, heap, uninitialized data (static/global variables), initialized data and text (code).
  1. The minimum Virtual Memory for a MPI program (Simplest MPI program like hello world ) is about 100M for 1 Processor, and 163M for multiple processors which are basically the environment part. The VM is same for each of the multiple processors. So the dependence on x—process number in Jonathan's fit is because of the AstroBEAR code
  1. For a simple program with very big array of data (global/static array or heap), Virtual memory usage is about the size of data. NO BIG RATIOS! I think the big ratio I had for "Hello World" is because of the minimum virtual memory of MPI program is BIG —100M.
  1. When the data array is local—no matter how big it is, the Virtual memory used is only the minimum virtual memory. This suggests if we use more local variables, we will REDUCE the virtual memory usage!
9. Baowei Liu -- 13 years ago

For local data, I tried an 8GB array . The program runs fine and the virtual memory usage is about 100M for 1 processor and 163M for multi-processors.