[DFTB-Plus-User] DFTB+ mpi running issue
Pei Wang
n8413274 at qut.edu.au
Thu Oct 21 04:08:35 CEST 2021
Dear DFTB+ users,
I'm calculating the DOS of a carbon system with 1e4 atoms using 96 gpus
and 380 gb memory. The program reported error while finished 1 geometry
step before generating charge.in. I inquired tech support of hpc and
they suggested that it might be a bug or feature of the software. Could
you help identify what cause the problem? Thanks.
Best Regards,
Pei Wang
-------------------------------------------------------------------------------------------------
input file
/Geometry = GenFormat {
<<< geo.gen
}
Driver = ConjugateGradient {
MaxSteps = 2
LatticeOpt = Yes
MaxLatticeStep = 0.005
}
Hamiltonian = DFTB {
SCC = Yes
# ReadInitialCharges = Yes
SCCTolerance = 1e-6
Solver = DivideAndConquer{}
MaxAngularMomentum = {
C = "p"
}
Filling = Fermi {
Temperature [Kelvin] = 300
}
SlaterKosterFiles = Type2FileNames {
Prefix = "../../slako/"
Separator = "-"
Suffix = ".skf"
}
KPointsAndWeights = {
0.0 0.0 0.0 1.0
}
}
Parallel{
# UseOmpThreads = Yes
}
Analysis {
ProjectStates {
Region {
Atoms = C
ShellResolved = Yes
Label = "pdos.C"
}
}
}
ParserOptions {
ParserVersion = 8
}/
---------------------------------------------------------------- error
/Loading dftbplus/20.1
Loading requirement: openmpi/4.0.2
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
dftb+.mpi 0000000001332DEB for__signal_handl Unknown Unknown
libpthread-2.28.s 000014ABBE82AB30 Unknown Unknown Unknown
libuct.so.0.0.0 000014ABA95F36DE uct_mm_iface_prog Unknown Unknown
libucp.so.0.0.0 000014ABA98270EA ucp_worker_progre Unknown Unknown
mca_pml_ucx.so 000014ABA9C88317 mca_pml_ucx_progr Unknown Unknown
libopen-pal.so.40 000014ABBDAA9E8B opal_progress Unknown Unknown
libmpi.so.40.20.2 000014ABBEE081AB ompi_request_defa Unknown Unknown
libmpi.so.40.20.2 000014ABBEE3162D MPI_Testall Unknown Unknown
libmkl_blacs_open 000014ABBF7FE818 MKLMPI_Testall Unknown Unknown
libmkl_blacs_open 000014ABBF802C42 BI_BuffIsFree Unknown Unknown
libmkl_blacs_open 000014ABBF802816 BI_UpdateBuffs Unknown Unknown
libmkl_blacs_open 000014ABBF7DCFED dgesd2d_ Unknown Unknown
dftb+.mpi 000000000130A227 Unknown Unknown Unknown
dftb+.mpi 0000000001232901 Unknown Unknown Unknown
dftb+.mpi 0000000001232ACB Unknown Unknown Unknown
dftb+.mpi 000000000053FFFC Unknown Unknown Unknown
dftb+.mpi 00000000004B41F3 Unknown Unknown Unknown
dftb+.mpi 000000000049CB88 Unknown Unknown Unknown
dftb+.mpi 0000000000419608 Unknown Unknown Unknown
dftb+.mpi 00000000004177A2 Unknown Unknown Unknown
libc-2.28.so 000014ABBE4764A3 __libc_start_main Unknown Unknown
dftb+.mpi 00000000004176AE Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)/
/.../
/.../
/dftb+.mpi 00000000004177A2 Unknown Unknown Unknown
libc-2.28.so 000014F7D7A504A3 __libc_start_main Unknown Unknown
dftb+.mpi 00000000004176AE Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node gadi-cpu-clx-2950
exited on signal 9 (Killed).
--------------------------------------------------------------------------/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20211021/e19d3793/attachment.htm>
More information about the DFTB-Plus-User
mailing list