CRL 618

We are trying finish these simulations. Bin and I have been experiencing some problems at bluehive and bluegene. These seem independent issues and I've open tickets 162 and 163 about them. We already have a reservation in bluegene for this Wednesday to solve the problem in that cluster.

As a plan B, I've sent 4amr versions of 2 of the runs to bluehive's queue.

5 dec 11. The 4amr level simulations, clump-stratifiedAMBIENT and jet-stratifiedAMBIENT, aborted after completing ~40% of the run. They were running in 24 afrank nodes at bluehive (vmem=19000mb). Chombos at that moment are ~ 700mb. They only report:

(A)

./launch.s: line 3: 5538 Killed ./astrobear

mpirun has exited due to process rank 18 with PID 5530 on node phy089 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here).

I tried restarting but they die shortly after, and produce the above error message. I will now recompile with check flags on, which will make the runs 10 times slower but should give more info at abortion.

About the 5amr runs and bluegene. We have a reservation in that cluster for this Wednesday to do debugging.

Comments

1. martinhe -- 13 years ago

The 4amr restart with check flags on did not give more info than (A; above).

2. Jonathan -- 13 years ago

i think i may have found a bug that would cause restarts with particles to fail…