Development Procedure
I just finished altering astrobear and adding my module files, Since the procedure I went through is pretty standard, I thought it would be useful for others (specifically new users) to know the correct way to do this. Here is the new wiki page: Development Procedure. It is linked under the Tutorials page and also the Code page. Please let me know if there are any errors or if you think more detail should be added.
Meeting Update 09.27
Mainly working on the multiphysics solver. Some change to the scheme I talked about last week:
(1) the viscosity solver is now using a 8-cell stencil instead of 12-cell in 2-D.
(2) now both solvers use allocated arrays to store the source terms (viscous force and current) instead of calculating source on each cell thus it saves repeated computations.
(3) now the source term flux from multiphysics solver goes into the fixup fluxes to render a better momentum and divergence conservation between levels.
(4) we are able to switch between the super-stepping approach ESHSSHSE (2 additional ghost zones)and more symmetric approach ESHSEESHSE (4 ghost zones.)
(5) both viscosity and resistivity use the same ghost zone, the solver swaps their order depending on which AMR step it is in.
(6) working on 1-D interface problem to make sure the AMR works correctly. After that, we would do an instability problem that can be put in the future paper.
Cleaning up portable thread implementation
I have a reservation for a 8 hour interactive session on 32 of the afrank nodes tomorrow during which time I will be doing some final testing of the new dynamic load balancing scheme and making some minor tweaks. Today I am cleaning up the code changes made for the Portable threads and checking the implementation so I will be ready for tomorrow and also so that they can be checked into the main repo. Also working on copying and pasting the paper into the proceedings format for the AstroNUM 2011 conference.
Encouraging results
While the reservation was for 4 nodes with 8 procs per node, the afrank nodes had hyperthreading enabled - which caused mpirun to put 16 processes on a single node or two processors per proc. The idea behind hyper threading is that you overtask nodes so that they can switch back and forth between various jobs. By giving each processor two instances of the code, when one process is stuck waiting for a message, the other process can still work. The processor time per cell update would appear to double, although the actual core time per cell update should remain constant - or drop slightly if the hyperthreading gives an improvement. Going from 8 to 16, the cputime per cell update more then doubles indicating that hyper-threading actually hurts performance. it also makes it more difficult to schedule tasks at the application level, since the processor is switching back and forth between two instances of the code. The take away is that hyperthreading should not be used with AstroBEAR 2.0. Notice that the dynamic load balancing with the portable threads still gives a better performance whether or not hyperthreading is enabled.
Threading Mode | No Threading | Scheduling | Portable Threads | |||
---|---|---|---|---|---|---|
Level Balanced | T | T | F | T | F | |
8 | 3.0360618888 | 3.5354251260 | 3.9989763675 | 2.4490875460 | 2.3749391790 | |
16 (HyperThreading) | 7.1959148653 | 7.1250127389 | 5.3294605455 | |||
32 (HyperThreading?) | 12.1093259489 | 14.4666908237 | 9.2395890449 |
(Not sure if the 32 core run used both nodes or not…)
I was also somewhat surprised to see that global load balancing finally improved performance. Global load balancing is the opposite of having every level balanced. Previously performance was better when each level was balanced via splitting. While this created more smaller grids and therefore increased the overall workload - the improved distribution outweighed the cost - although perhaps because the load balancing scheme was not correctly compensating for imbalances.
Here is an image with the distribution showing the additional fragmentation on the right when Level Balanced is true. The top panels show every patch while the bottom panels show the mpi_id of the patches.
Finished looking at results from 902 cells per proc per level test… Tomorrow I have a reservation and will generate similar data for 162 322 642 1282 and 2562 if there is time. As the number of cells per proc per level increases, all of the lines will flatten out - but i should have all the necessary data for the paper.
Erica's meeting update
I didn't find much literature on a rotating Bonnor Ebert sphere, per se, but some pioneering works on rotating clouds may be helpful:
http://adsabs.harvard.edu/abs/1982ApJ...255..240B
http://adsabs.harvard.edu/abs/1976ApJ...206..138B
http://adsabs.harvard.edu/abs/1972MNRAS.156..437L
As well as some presentations that talk about rotating BE spheres but with additional physics:
Sink tests we talked about on Friday are in the queue. Only asking for 32 cores for these jobs, but maybe there is a backup in the queue due to the in-accessibility of Bluehive yesterday??
Meeting Update 09/27/2011 - Eddie
Mostly just been reading this past week. From Toro I have finished Ch. 2 (Notions on Hyperbolic Partial Differential Equations) and Ch. 3 (Some Properties of the Euler Equations). I am currently on Ch. 4 (The Riemmann Problem for the Euler Equations). I see that Toro will start getting into some Fortran examples in this chapter, so pretty soon we should decide on what kind of 1D Hydro code I should write.
Also, I made some changes to astrobear when writing my RT module to make uniform gravity work. I plan on committing these changes to the repo, but first need to run tests on bluehive. However, mercurial is currently not working on bluehive, so I am waiting for CRC to fix that.
Update 9/26
Crl 618
The jet with rings simulation is coming along nicely. Still running in Bluehive with about 30% done along the x-direction. http://www.pas.rochester.edu/~blin/sum11/400jet/400jet.gif
Don't see the jet interacting much with the ambient rings as much as the clump.
Testing
Updated the Field Loop 2D test page to Astrobear 2 relevance.
Had to do a rerun of all tests (for note, each run takes around 6 minutes on 2 procs) for the radiative shock test due to a error in my global.data. I originally had
Gmx=16,2,1 (so the x component is 8 times the y component) BUT GxBounds=0,0,0,32,2,0 (so the x length is 16 time the y length)
This was changed for GxBounds=0,0,0,32,4,0. Otherwise, bear2fix wouldn't generate the density plots I needed. Still trying to get familiar with bear2fix.
This week, I will do more work with bear2fix and try to get the density plots done and update the radiative shock testing page.
viscosity and resistivity refinement limitations
The explicit solver should meet the requirement:
where are viscosity and sound speed.
This leads to the relation:
where is the ion-ion mean free path.
This makes sense since the viscosity is a collective behavior of ion-ion collisions. So refine under one ion-ion mean free path does not make sense.
The resistivity solver has a similar limitation:
Meeting Update 09.16
I was mainly working on the array ghosting scheme Jonathan and I talked about last week, which stores the ghosting info necessary for each separate field variables in an array and tracking this array during the advance. It is more general than our current method so that it can be easily extended to algorithms with multiple physical processes. But when I was testing the multiple physics functions, which has now resistivity and viscosity built in, it appears that for conservative variables, we always need all the hydro vars to go into the multiphysics calculation, which makes it not efficient to keep tracking the number of ghost zones required for each hydrodynamic field entries. Another idea is to do a simplified mbc array which is 2 by 1 and has the first dimension taking values from 1 to 4 which tracks essentially the hydro vars, aux, elliptic vars and tracers. The other approach is to return to our current approach but adding new smbc to define the ghost zones required by the source term solver while combine the functionality of egmbc.
For the integration of the solver into the code, it seems we cannot mechanically calculate the dqdt for each multiphysics process and lump them into the SrcDeriv for RK update since that update requires time subcycling which is not a problem for cooling and self gravity which do not need to destroy additional ghost zones for each advance, but will be a problem for diffusion, resistivity and viscosity. These three processes require additional peripheral zones to update the internal data. Diffusion and resistivity need 1, viscosity needs 2. This means subcycling 10 times in the source term advance would require 20 more ghost zones for hydro vars when doing viscosity, which is not acceptable. Therefore we are looking for a totally independent explicit update module similar to the implicit thermal conduction thing that resides outside the current source term update. Then we would be doing the update in the order of E-S1-S2-H-S2
where E is the elliptic update, S1 is the one step source term explicit update, S2 is the current source term explicit update that does subcycling. In this way the physics solved in S1 are not integrated into the flux calculation and the subcycling process, which may induce inaccuracy, but that is price we have to pay if we want to keep the code simple and restrict communication. Or we can calculate a slope in S1 and then use it to update the hydro vars during each subcycling step in S2.
Load balancing algorithm
I made some minor adjustments to the load balancing algorithm to match the description in the paper and the initial results are encouraging… Now the threading / dynamic load balancing is 10% faster on 8 cores… Need to do some scaling runs on 32 / 64 to confirm that this trend persists. Also trying to figure out the right functional form to fit the scaling data.
Weak Scaling
Weak scaling results for various levels of AMR for 1282 cells per proc per level
And the weak scaling results for 3 levels of AMR with 642, 1282, and 5122 cells per proc per level
If we let
represent the total number of cells to update, and represent the computational cost per cell and be the additional computational work that is incurred with processors, and let be the wall time thenFurthermore if
then for we can taylor expand which combined with the scaling results givesAlso
However, if we assume that
thenThe problem is we don't have enough data points to determine which functional fit is accurate so it is difficult to project.
So I went back and looked at different possible fit functions and plotted the results…
The best fit is the first one
and if it holds, implies performance drops to 50% at or 1.74 KProcs and to 25% at 5.3 GProcs! Granted this is weak scaling and your resolution at 5.3 GProcs would be 36 MCells2 and simulating one crossing time would take 12 yrs instead of 3 yrs… but that's why strong scaling is more important… or being able to weakly scale when the workload per processor is only a few cells.Why would the scaling be proportional to
? If we consider the distribution of grids at each time step and assume that the workload distribution follows a normal distribution with some standard deviation , then the time to complete each step is not given by the mean of the workload but by the max of the workload . The average maximum value is given by the last order statistic where is the number of processors. I couldn't find an expression for the last order statistic for a normal distribution, but monte-carlo experiments give a fitting function . Given the best fit this implies thatMeeting Update 09/20/2011 - Eddie
Progress made last week:
- New RT page that summarizes everything I did - ehansen09142011
- More detail on AMR control - ehansen09162011
With RT finished, I have begun reading more. So far I have read Dopita Chs.1 (What is the Diffuse Universe?) and 2 (Line Emission Processes), Toro Ch.1 (The Equations of Fluid Dynamics), and Hartigan's review article on the Measurement of Physical Conditions in Stellar Jets
9/19 Update
The 400 jet with rings is still running in BH with 32 procs, but here is movie of it so far: http://www.pas.rochester.edu/~blin/sum11/400jet/400jet.gif
The 400 clump with no rings is done. The refinement was reduced about halfway to compensate for the large file size. http://www.pas.rochester.edu/~blin/sum11/400clumpnoring/400clumpnorings.gif
This meeting's update
I ran the following fixed grid tests on the sinks to learn more about why the formed particle is wandering:
- Fixed grid, same set up as before — to see if this problem happens when AMR is shut off.
- Originally, my simulation was a 203 grid with 2 levels of AMR (this formed a sink). I ran a fixed grid then with 803 and this did NOT form a sink. The system was reported as not bound, which I saw before was fixed by increasing the resolution of the grid. I ran an 853 simul., and this DID form a sink. Further, this sink did NOT move out of its 'parent' cell. See the page here - http://www.pas.rochester.edu/~erica/collapse9192011.html. I wondered if this was a result of fixed grid alone, or the higher effective resolution. When I tried to run an AMR simulation with ~ 853 eff. cells — using 213 + 2 levels, no sink was formed. Given this is a higher res. run than the first I quoted here, this is curious. The differences between the simulation as far as I can see are 1) the refinement criteria in the first called for the inner most region of the sphere ONLY to be resolved, 2) global cell numbers was off — in the first mxGlobal ran to 25 instead of 20. It doesn't seem like this should make a difference, as long as the effective resolution is ≥ the old simulations — a sink should form..
I am wondering if I should still do this test, since a sink formed in fixed grid, even number of cells? — Fixed grid, odd number of cells — since a sink is formed in the center of a cell, see if restricting the cell to the true center of the grid resolves the issue.
Enlarging the accretion radius from 4 to 8 and 16. This may help with smoothing out the potential.
Checking if increasing the resolution stops the particle from having a kick will also be useful.
Additionally, I added my radial velocity plots to the BE page under the results section and compared to the Foster and Chevalier results in the discussion section. I think this page is more organized and nice since the last edits.
—Also, I noticed a sink formed in the fixed grid simulation earlier — frame 13, as opposed to frame 29 as in the AMR simulation. I am running a simulation now that has the same geometric refinement criteria I originally formed a sink with.. to see if this helps in forming a sink in the 21+2 levels case above.
—Question - I see coordinates in my sink file, but there is no values in the bottom array… What does this mean? The cell at the position violated the True Love criteria (density threshold), but did not officially form?
Playing with AMR
For much of the latter part of this week, I have been using my RT module to learn how to control mesh refinement. I've added a few things to the ControllingRefinement page that should help new users. There's a new subsection on how changing qTolerance and DesiredFillRatios affects the mesh with some snapshot examples. There's also more in the ProblemSetErrFlag section now with a sample code. Since a lot of this information is useful on several wiki pages, I've placed links where appropriate.
RT finished, see wiki page
I've conquered 2D Hydro RT. All necessary simulations and calculations are complete. See my new wikipage for details on the process that I went through u/ehansen/RT. Is this an appropriate/good spot for a page like this?
Martin, let me know which files you want for the test suite and I'll get them organized/ready for you.
Meeting Update 9/13/11 - Erica
I have put together a summary of the Foster and Chevalier paper on my summaries page - https://clover.pas.rochester.edu/trac/astrobear/wiki/u/EricasLibrary#no2, and bulleted the results I'd like compare in my BE tests.
I have made some line plots of the radial velocity. The low resolution in the outer region of the sphere created a jagged appearance, which will make a fine comparison to the Foster and Chevalier difficult. I will work on the tests we mentioned last week in the coming days which may lead to better plots. Here is a page of the plots so far: http://www.pas.rochester.edu/~erica/new_plots_BE.html.
Tracking the RT Interface
At first, I was skeptical that my bear2fix routine was really tracking the interface of the RT instability. I plotted the bear2fix values over the visit density plot to check this. It appears that it tracks the interface fairly well. Certainly good enough to get a decent value for the growth rate. Interface Movie
Meeting Update 09.12
Implemented viscosity in full tensor form for isotropic fluid, which should be enough for our application. Also implemented the realistic viscosity which depends on the ion mean free path and averaged thermal velocity. The scheme now takes full consideration on the gradient of viscous tensor induced by different temperatures, which resulted from different ion thermal velocities between cells. This renders us the ability to solve problems with viscosity at temperature fronts. We may want to implement a table that look up the cross section for different types of ions, currently the ion mean free path is defaulted to the case of hydrogen. One thing worth considering might be different types of ions — the code currently defaults to one type of ion. For multiple species present at the same time, I am not sure how to get the averaged cross section and thus the viscosity tensor, yet. For a detailed description, see the updated resistive MHD project page. On the same page, there is also a comparing test between hydro RT vs viscous RT vs diffusive RT. The viscosity and diffusivity are both linear constant, and are purposely set to be comparable in value.
Meeting Update 09/13/2011
I have been working on the RT growth rate calculation. I am currently trying to track the interface between the heavy and light fluids by tracking the position of the density gradient. The simulation growth rate is calculated through a line of best fit method. This method is very sensitive to which data points you use. There is one specific range of the data where the simulation growth rate is calculated to be less than 0.25% relative error with respect to the analytic growth rate. This region is from t=0.5 to t=1.5 (i.e. a short time after the perturbation and before the the instability becomes nonlinear).
Also, my simulation has some improper nesting which may be responsible for the asymmetry that we saw last week. This probably doesn't affect the growth rate calculation, but nonetheless I'm playing around with the qtolerance and fillratios to get rid of that.
Update
Shocks testing: learning bear2fix to make density plots. What does it mean when bear2fix asks for duplex 0/1?
CRL 618: running the jets with ambient rings in BH while Martin has clumps and jets without ambient rings in BG.
Extended Disk simulations to 3D
Did analysis of disks looking at how the gravitational softening length effects the presence of a jet and the loss of angular momentum. Also confirmed that 2.5D produces similar results to 3D axi-symmetric runs. See the results here.
Meeting Update September 6th
Scaling tests on infiniband completed successfully. Just need to parse the data to produce figures for the paper. Also ran a 2D disk simulation to test angmom conservation and posted to the wind disk interaction project page
Meeting Update 09.06
Both viscosity and resistivity have been hooked up in the code using operator splitting approach. The Hartmann flow module has been finished, but still require a functionality to calculate the proper analytical solution at any given time and do the comparison. The R-B module is also in the code now. For the resistivity module, I am trying to find a resistivity + hydro only test problem with analytic solution that does not require the rotating cylinder. The Hartmann flow requires viscosity and the R-B instability requires thermal flux so neither is resistive only.
Meeting Update 09/06/2011
Uniform gravity is now working in my RT module. Here is a movie of the latest simulation. It is a 50 x 150 grid with 2 levels of AMR.
RT Movie (Test Suite Simulation)
The only thing left to do is run bear2fix to calculate the growth rate, and check it with the analytic value. If the values reasonably agree, this will be all set to go into the test suite.
MfuThree
Summary of the Magnetic fields in the Universe III conference (http://www.oa.uj.edu.pl/mfu2011/):
Reconnection:
-what triggers it?
-what determines its rate?
-what's the condition for fast reconnection?
-turbulence? fractal reconnection?, plasmoid-induced reconnection?
-what is its role in: MRI, star formation, dynamo, acceleration of jets?
-may reconnection jets may be observed in the ISM?
-does reconnection jet leads to shocks and possibly triggers SF (Tanuma & hibata 2007, PASJ, 59)?
-turbulent FAST reconnection
Particle acceleration:
-what are the mechanisms?
-what fraction of mag energy goes goes to non-thermal particle acceleration in reconnection?
Star formation (SF) cores: -Partially ionized phase with strong fields?
-b-fields strength and morphology in extended clouds & cores?
-diffusion mechanisms leading to SF cores?
-cloud and star formation schemes with turbulence and b-fields … the path to form clouds may be important
Milky way:
-NGC 6744 is the most similar galaxy to ours
-large scale reversals in the disk
-odd-symmetry halo fields?
-strong vertical fields in the central region
-existing fields models seem insufficient
-pulsar RM data: local fields is clock wise, but the fields in Sagittarius arm is counter-clockwise; one reversal.
-RM from background sources possibly dominated by HI structures
-field models should accept that spiral arms of the Galaxy are not regular
-the regular fields may not be located in the spiral arms
-model should include halo fields
-simple quadrupole models are wrong.
-MRI-driven dynamos. Consistent with obs?, cooling?, ionization fraction?
Galaxies
-there are halo fields (Marita Krause). Indication of galactic winds
-there are inter galactic fields and turbulent fields. Polarized radio intensity is a signature of ISM turbu.
-no dynamo without wind
-Caution: distinguish between anisotropic fields (compression or shear; stronger in density-wave galaxies) from truly regular ones.
-butterfly diagram of fields patterns from MRI models (Mami Machida). Statistics of even/odd-symmetry fields patterns are needed.
More questions:
-the golden age of polarimetry is comming.
-Key observations to be done before MFU4?
-Do we expect answers or more questions from the the SKA and LOFAR?
-ALMA will zoom in to SF cores
-SFRxMF correlation in gals. evidence of turbulent dynamo?
-origin of B in diffuse IGM (TeV - GeV obs. > 10-16G. Seed fields of cosmological origin/phase transition or Biermann battery
-diagnostic with GENUS/TSALLIS statistics in turbulence
-reliable foreground maps based on turbulence to remove from CMB maps - PLANCK
-mean field dynamo models. realistic or not?
-stability of jets at all scales
-combine different obs. in a better way
-what theoretical work is necessary?
-what to "bridge" the gaps between theory, sims and ob?
Meeting tonight at 6 pm
Agenda
Plan meeting schedule for semester
Questions for senior programmer
Research review
Update week September 5th
- Sink particle formed in my BE test:
http://www.pas.rochester.edu/~erica/bonner_5Sept11.html
- Updated New Users page with some info I learned about sink implementation
https://clover.pas.rochester.edu/trac/astrobear/wiki/u/NewUsers
- Added summaries to my library
https://clover.pas.rochester.edu/trac/astrobear/wiki/u/EricasLibrary
- Added commented problem.f90 to BE project page.
https://clover.pas.rochester.edu/trac/astrobear/wiki/BonnerEbert
Results from latest radiative shocks testing
Seems like we're getting the expected results in comparison to Blondin, Immamura, et al. The case of alpha = 0 does not dampen in oscillation, ½ shows rapid dampening, and 1 shows little oscillation. We also observe secondary smaller shocks.
meetings for next week?
Hi all — I am finally feeling better, and would like to catch up on things. Can someone please post the meeting schedule for this upcoming week? Are we moving the locals meeting to another day since Monday is Labor Day? Since I missed the New Users meeting, could someone kindly review the AMR talk at the next one for me please
Erica
Working on Scaling runs on bluehive's infiniband nodes and example of load balancing across threads
Currently running two sets of scaling tests. All simulations are 2D
Weak scaling
filling fraction | .25 |
MaxLevel | 0,1,2,3,4 |
Threaded/Every Level Balanced | N/Y, Y/N, Y/Y |
nProcs | 1 2 4 8 16 32 48 64 80 96 104 112 |
Base Res1 (per processor) | 64 128 5122 |
- 1 Domains are kept square with number of cells per processor approximately constant
- 2 Only with a MaxLevel of 3
Strong Scaling
filling fraction | .1 .2 .3 .5 .7 1 |
MaxLevel | 4 |
Threaded/Every Level Balanced | N/Y, Y/N, Y/Y |
nProcs | 1 2 4 8 16 32 48 64 80 96 104 112 |
Base Res | 32 64 |