Posts in category data-management

Update with 1024cores

I ran the Beta 10 No Shear case on Stampede with 1024 cores. Here is the result (see Table below):

So if we're at frame 246, we have 154 frames left. So dividing 154 by our rate, we have 9.4 days (225.6 hrs) to run this simulation out. Thus, 225.6 * 1024 = 231,014.4 cpu hrs. Multiply this by 4, as we have 4 runs, yields approximately 924,057.6 cpu hrs total. This is not much different then the total result from last week. It does not seem economical to run these on 1024 cores; in my opinion we might as well just run these on 2048 cores as they'll be faster but have little to no shift in cpu hrs.

Perhaps we should choose just a few cases on 2048 cores?

If on 2048 cores we estimate 34.85 frames a day (average of rates from last blog post) with approximately 164 frames left (average from last blog post) that implies that we have approximately 5 days to run a simulation, or 113 hours. This is approximately 231,304 cpu hrs. With 3 runs, that is 693,911 cpu hrs. With 2 runs that is 462,607 cpu hrs.

Perhaps we could split the runs between machines? However we aimed to use Stampede because it is so fast in comparison to the likes of BlueStreak.

Run (Current Frame) Hours Sampled Time (mins) Avg. time (mins)
b10s0 (246) 23:30 - 00:50 80
10:27 - 11:58 91
19:44 - 21:16 92
87.67

Table. The Beta 10 Shear 0 run with current frame for which the hours were sampled and averaged to do the brief calculations above.

cpu hrs for CF runs on Stampede

Running the CollidingFlows problem out from frame 200 to 400 to double the time and see if we can observe any more sink formation. Given that this run is really computationally intensive, I've done a quick calculation for cpu hrs based on some current runs I am doing on Stampede. All runs are in the normal queue for 24 hrs on 2048 cores. The table below provides the current frame number at which I collected this data. We can see that the average time for our code to spit out a frame is (underscores correspond to the run):

Given that we have 1,440 minutes in a day, implying that we'd spit out the following frames per day:

Considering that the difference between the current frame and the last frame (400) for beta10 shear 0, 15, 30 and 60 respectively are 179, 182, 159, and 136, we're looking at running these out for approximately 5-6 days on 2048 cores. Specifically for b10s0: 5.5 days, b10s15: 5.8 days, b10s30: 5.2 days, and b10s60: 3 days. Using this number of days, that there are 24 hours in a day and we'd run these on 2048 cores, this puts us at a total of: 957,973 cpu hrs. THAT IS INSANE.

After a quick discussion with Erica and Baowei I've come up with the following short term plan: Once these jobs stop later today, I'll submit 1 job to the normal queue on 1,000 cores. For this run I'll make the same calculation and see if it is more economical when multiplied by 4. Baowei has also suggested to throw runs on Gordon, another machine owned by the Texans. We have a lot of SUs there, so he is currently setting me up. We currently only have 1,551,296 SUs available on Stampede — so running our jobs for this problem there could be quite precarious.

Run (Current Frame) Hours Sampled Time (mins) Avg. time (mins)
b10s0 (221) 16:07 - 16:55 48
02:58 - 03:38 40
07:13 - 07:58 45
44.3
b10s15 (218) 18:03 - 18:48 45
02:19 - 03:03 44
11:05 - 11:54 49
46
b10s30 (241) 17:57 - 18:40 43
00:26 - 01:23 57
07:03 - 07:44 41
47
b10s60 (264) 17:40 - 18:07 27
00:04 - 00:38 34
07:43 - 08:18 35
32

Table. Each run with current frame for which the hours were sampled and averaged to do the brief calculations above.

Meeting Update 09/08/2014

Over the past few weeks I've spent my time shuffling around data and organizing things to make visualization not… impossible? So now we have all of the data together in their respective directories, shared between bamboodata and grassdata. Although one can use Erica's directories in grassdata as "home base" as I made sym-links to the bamboodata directories. So everything seems like it is stored with its applicable files on grassdata.

Current Disk Usage:

/bamboodata/ 86%
/grassdata/ 89%

Now we're doing post processing on the data, along with resolution testing. We have updated our wikipage (see: CFRunStatsHighRes).

I am also studying for the physics GRE, and given my study/work schedule I might not show up for group meetings. I'll still post updates.

Running Jobs

  • Currently:
    1. Bluestreak (BS):
      • Post processing (PP) on our chombos (reservation, for B1S60 and B10S15 256 nodes). Once those two runs are done, we'll do the other two. Eventually we'll need to throw the NoShear case into the mix (see 2.).
      • Convergence testing at AMR level 3 (pending in Standard queue for both shear 60 cases). Once these jobs are done, we'll do the other two if necessary.
    2. Stampede:
      • Running the NoShear case on there. Have done approximately 3 restarts. Current restart began 187, so this should be the last restart there. After each restart completes and I restart the next one, I tranfer files back to our local directories. Once we have all the files, I'll transfer them to BS to do PP.
  • Future:
    1. PP on BS of NoShear data.
    2. PP for B10S30 and B10S60 on BS.
    3. Convergence Tests for the rest of the data sets.

Tranferring Files

  • Once all PP on BS is done, transfer everything back to local directories.
  • Tranferring NoShear to BS and local machines.

Visualization

  • Erica submitted the proposal to the Vista Collaboratory Contest this past Friday.
  • Visualization on BS using x2go.

Computing Resources & Local Machines

Noticed as I am scp-ing data on the CollidingFlows runs that some of the typical machines I used are starting to become full. I also do visualizations on Clover or my own machine, however they can be quite hard on my own graphics card. This has led to talk to Baowei and others about potentially looking into developing a new machine quote (see: https://astrobear.pas.rochester.edu/trac/blog/bliu02072012).

Given that the quote is 6 months old, that the price is not the same anymore. It has been suggested to me that someone in the group should e-mail Sean Kesavan <sean@asacomputers.com> and ask him to give up an updated quote or change it to meet our requirements (referencing the quote number on the quotation). I also think these are the specs for Bamboo (https://www.pas.rochester.edu/~bliu/ComputingResources/ASA_Computers.pdf).

In my opinion it might be a good idea to look into more disk space/storage and also get a new machine for visualization. In the mean time I took a look at our local machines and who is using them.

Commands I am using:

*du -sh * 2>&1 | grep -v "du: cannot" prints the disk spaced used for each user by printing standard error to standard out.

*df -h inside of the directory to see disk spaced used (along with %) and space that is available.

alfalfadata

The Disk Space Used

DS Used User / Directory
63G bliu
13G ckrau
4.6G ehansen
227G erica
422G johannjc
16K lost+found
1.1T martinhe
4.0K modulecalls
20M nordhaus
1.3T shuleli
300M test.dd
3.5G trac_backup

The Disk Space Available

Filesystem Size Used Avail Use% Mounted on
aufsroot 11T 4.7T 6.3T 43% /
udev 7.9G 4.0K 7.9G 1% /dev
tmpfs 1.6G 1.3M 1.6G 1% /run
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 108K 7.9G 1% /run/shm
none 100M 160K 100M 1% /run/user
/dev/md127 5.5T 3.3T 1.9T 64% /media/grassdata
deptdisk.pas.rochester.edu:/home/ehansen 11T 4.7T 6.3T 43% /home/ehansen
deptdisk.pas.rochester.edu:/home/shuleli 11T 4.7T 6.3T 43% /home/shuleli
alfalfa.pas.rochester.edu:/alfalfadata 5.3T 4.3T 802G 85% /mnt/net/alfalfadata

bamboodata

The Disk Space Used

DS Used User / Directory
1.2T bliu
4.0K brookskj
6.0G chaig
1.6G Defense.zip
8.0K ehansen
580G erica
1.1T johannjc
16K lost+found
2.6T madams
1.2T martinhe
21M rsarkis
4.9G ScalingTest
3.6T shuleli

The Disk Space Available

Filesystem Size Used Avail Use% Mounted on
aufsroot 11T 4.7T 6.3T 43% /
udev 7.9G 4.0K 7.9G 1% /dev
tmpfs 1.6G 1.3M 1.6G 1% /run
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 108K 7.9G 1% /run/shm
none 100M 160K 100M 1% /run/user
/dev/md127 5.5T 3.3T 1.9T 64% /media/grassdata
deptdisk.pas.rochester.edu:/home/ehansen 11T 4.7T 6.3T 43% /home/ehansen
deptdisk.pas.rochester.edu:/home/shuleli 11T 4.7T 6.3T 43% /home/shuleli
alfalfa.pas.rochester.edu:/alfalfadata 5.3T 4.3T 802G 85% /mnt/net/alfalfadata
bamboo.pas.rochester.edu:/bamboodata 13T 11T 1.2T 91% /mnt/net/bamboodata

botwindata

The Disk Space Used

DS Used User / Directory
4.0K bliu
673G johannjc
904K orda
3.4G repositories
4.0K repositories_backup.s
11G scrambler_tests
2.6G tests
65G TO_CLEAN
75G trac
4.0K trac_backup.s
7.2G trac_dev
4.0K trac.wsgi
1.8M www

The Disk Space Available

Filesystem Size Used Avail Use% Mounted on
aufsroot 11T 4.7T 6.3T 43% /
udev 7.9G 4.0K 7.9G 1% /dev
tmpfs 1.6G 1.3M 1.6G 1% /run
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 108K 7.9G 1% /run/shm
none 100M 160K 100M 1% /run/user
/dev/md127 5.5T 3.3T 1.9T 64% /media/grassdata
deptdisk.pas.rochester.edu:/home/ehansen 11T 4.7T 6.3T 43% /home/ehansen
deptdisk.pas.rochester.edu:/home/shuleli 11T 4.7T 6.3T 43% /home/shuleli
alfalfa.pas.rochester.edu:/alfalfadata 5.3T 4.3T 802G 85% /mnt/net/alfalfadata
bamboo.pas.rochester.edu:/bamboodata 13T 11T 1.2T 91% /mnt/net/bamboodata
botwin.pas.rochester.edu:/botwindata 890G 838G 6.3G 100% /mnt/net/botwindata

cloverdata

The Disk Space Used

DS Used User / Directory
28M afrank
368K ameer
72K aquillen
4.0K array.f90
4.0K awhitbe2
20K BasicDisk
137G bcc
31M blin
9.7M bliu
1.7G bobbylc
3.3G chaig
2.6M clover
37M cruggier
0 data
851M DB
4.0K devsrc
107M edmonpp
72G ehansen
301G erica
1019M iminchev
448K jnp1
3.0T johannjc
14M langao
154M laskimr
2.9M lijoimc
3.9G local
16K lost+found
221M madams
13G martinhe
8.0K MegaSAS.log
124K mendygral
37M MHDWaves
13M mitran
648M moortgat
852M munson
3.0G noyesma
20K odat1
4.0K old_accounts
45G pvarni
292K raid
3.3G repositories
5.8G repositories_backup
140M rge21
2.9G rsarkis
192K ryan
126G scrambler_tests
2.1T shuleli
16M spearssj
0 test
54M test3
0 test.me
840M tests
355M tests130
231M tests_old
452K tkneen
14G trac
27G trac_backup
60K wma
1.8M www

The Disk Space Available

Filesystem Size Used Avail Use% Mounted on
aufsroot 11T 4.7T 6.3T 43% /
udev 7.9G 4.0K 7.9G 1% /dev
tmpfs 1.6G 1.3M 1.6G 1% /run
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 108K 7.9G 1% /run/shm
none 100M 160K 100M 1% /run/user
/dev/md127 5.5T 3.3T 1.9T 64% /media/grassdata
deptdisk.pas.rochester.edu:/home/ehansen 11T 4.7T 6.3T 43% /home/ehansen
deptdisk.pas.rochester.edu:/home/shuleli 11T 4.7T 6.3T 43% /home/shuleli
alfalfa.pas.rochester.edu:/alfalfadata 5.3T 4.3T 802G 85% /mnt/net/alfalfadata
clover.pas.rochester.edu:/cloverdata 11T 5.8T 4.5T 57% /mnt/net/cloverdata

grassdata

The Disk Space Used

DS Used User / Directory
184M afrank
4.0K bshroyer
711M bshroyerdata
8.5M chaig
4.0K cruggier
27M data4out.out
127M data5out.out
4.0K ehansen
1.4T erica
4.0K eschroe3
2.1G ferreira
3.2G fnauman
33G grass_swapfile
29G jnp1
119G johannjc
4.0K johannjc:~
4.0K langao
4.0K laskimr
16K lost+found
310G madams
656G martinhe
12M munson
1.4G noyesma
112M root
762G shuleli
1.4G slucchin
4.0K tdennis
1001M test.dd
4.0K testlog
403M tkneen
46M xguan

The Disk Space Available

Filesystem Size Used Avail Use% Mounted on
aufsroot 11T 4.7T 6.3T 43% /
udev 7.9G 4.0K 7.9G 1% /dev
tmpfs 1.6G 1.3M 1.6G 1% /run
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 108K 7.9G 1% /run/shm
none 100M 160K 100M 1% /run/user
/dev/md127 5.5T 3.3T 1.9T 64% /media/grassdata
deptdisk.pas.rochester.edu:/home/ehansen 11T 4.7T 6.3T 43% /home/ehansen