Louhi is situated at the CSC in Espoo, Finnland (near Helsinki) and takes part in the DEISA grid project.
Here is a user guide: http://www.csc.fi/english/pages/louhi_guide/index_html
my make-rule is:
some hints for optimization are given.
I experience a nasty bug in all of my runs (40K+, 128 processors), after some time (approx. 15 walltime hours) the code crashes at writing the comm.2 file. This is deadly because comm.2 is then corrupted and the code can not be restarted.
last line of output is:
this might be because not enough memory can be allocated. There is (only?) 1 GB per processor.
It seems to happen in mydump.F.
PROCESSOR 0
log_nid = 0 phys_nid = 0x51f host_id = 8 host_pid = 2715
group_id = 52048 num_procs = 128 rank = 0 local_pid = 3
base_node_index = 0 last_node_index = 63
text_base = 0x00000000200000 text_len = 0x00000000400000
data_base = 0x00000000600000 data_len = 0x00000015e00000
stack_base = 0x0000007ec00000 stack_len = 0x00000001000000
heap_base = 0x00000016600000 heap_len = 0x00000025e00000
ss = 0x000000000000001f fs = 000000000000000000 gs = 0x0000000000000017
rip = 0x00000000002d6b90
rdi = 0x0000000000000002 rsi = 0xfffffffffffffff4 rbp = 0x000000007fbfa610
rsp = 0x000000007fbfa540 rbx = 0x000000001be8bdf0 rdx = 0x000000000c2fa85c
rcx = 0x0000000004400000 rax = 0xfffffffffffffff4 cs = 0x000000000000001f
R8 = 0x0000000003e00000 R9 = 0xfffffffffffffff4 R10 = 0x000000001be8bd20
R11 = 0x000000003b9f2fb0 R12 = 0x000000001be8bdf0 R13 = 0x0000000004400000
R14 = 0x000000003b98f630 R15 = 0x0000000004400000
rflg = 0x0000000000010202 prev_sp = 0x000000007fbfa540
error_code = 4
SIGNAL #11Segmentation fault fault_address = 0xfffffffffffffffc
Stack Trace: ------------------------------
#0 0x00000000002d6b90 in llu_queue_pio()
#1 0x00000000002d809a in llu_file_prwv()
#2 0x00000000004388cc in _sysio_enumerate_extents()
#3 0x00000000002d8d01 in llu_file_rwx()
#4 0x00000000002d8e80 in llu_iop_write()
#5 0x0000000000436391 in _sysio_iiox()
#6 0x00000000004365af in _sysio_iiov()
#7 0x0000000000437a39 in __write()
#8 0x0000000000454ae0 in _IO_new_file_write()
could not find symbol for addr 0x00008618000085bc
Page Information
|
Wiki Information |
Recent PBwiki Blog Posts |