[DFTB-Plus-User] MD-DFTB jobs get stuck

Kandes, Martin mkandes at sdsc.edu
Fri May 31 19:09:43 CEST 2019


Hi Balint,


I'm actually working with Natalia on this problem as it occurred on our system and I've helped her compile the different versions of DFTB+ she's using on it.


There are no error messages thrown from DFTB+. It simply hangs at some point in the calculation. For example, here it hung at 'Geometry step 14885' for me until the job ran out of time and the scheduler killed it [1]. There's not much to go on other than the problem is repeatable for the input she's provided, but each time you run with the same input it gets stuck at a different 'Geometry step'. This is an MPI-based build --- here is what the build script looks like [2]. I'll let Natalia comment of the type calculation itself as she's the expert here.


Any assistance you can provide would be much appreciated.


Thanks,


Marty Kandes

SDSC User Services Group


P.S. I also did look at the system logs of the nodes her jobs ran on. dftb+ processes are definitely running with high CPU utilization throughout the job, even though the output stops.


[1]


...

...

***  Geometry step: 14844

 iSCC Total electronic   Diff electronic      SCC error
    1   -0.17532428E+03    0.00000000E+00    0.15180536E-01
    2   -0.17532429E+03   -0.10059053E-04    0.18582984E-01
    3   -0.17532436E+03   -0.63219013E-04    0.42150625E-02
    4   -0.17532436E+03   -0.49355393E-05    0.78976793E-03
    5   -0.17532436E+03   -0.20731827E-06    0.21594471E-03
    6   -0.17532436E+03   -0.11787137E-07    0.46355288E-04

Total Energy:                     -174.7987968819 H        -4756.5173 eV
Extrapolated to 0:                -174.7987968819 H        -4756.5173 eV
Total Mermin free energy:         -174.7987968819 H        -4756.5173 eV
Force related energy:             -174.7987968819 H        -4756.5173 eV
>> Charges saved for restart in charges.bin
MD Temperature:                      0.0008694306 H          274.5441 K
MD Kinetic Energy:                   0.1825804254 H            4.9683 eV
Total MD Energy:                  -174.6162164565 H        -4751.5490 eV

--------------------------------------------------------------------------------

***  Geometry step: 14845

 iSCC Total electronic   Diff electronic      SCC error
srun: Job step aborted: Waiting up to 302 seconds for job step to finish.
slurmstepd: *** JOB 23685522 ON comet-25-48 CANCELLED AT 2019-05-30T10:43:21 DUE TO TIME LIMIT ***
slurmstepd: *** STEP 23685522.0 ON comet-25-48 CANCELLED AT 2019-05-30T10:43:21 DUE TO TIME LIMIT ***


[2]


#!/usr/bin/env bash

#SBATCH --account=use300
#SBATCH --job-name=dftb
#SBATCH --output=dftb.o%j.%N
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --export=ALL
#SBATCH -t 03:00:00

declare -xr DFTBPLUS_BUILD_DIR="/home/${USER}/Software/dftbplus/dftbplus-mpi"

module purge
module load intel/2016.3.210
module load intelmpi/2016.3.210
module load mkl/11.3.3
module load gnutools/2.69
module list
export MKLROOT="${MKL_ROOT}"
printenv

mkdir -p "${DFTBPLUS_BUILD_DIR}"
cd "${DFTBPLUS_BUILD_DIR}"

git clone https://github.com/dftbplus/dftbplus.git
cd dftbplus
git submodule update --init --recursive
cp sys/make.x86_64-linux-intel ./make.arch
sed -i 's/FXX = mpifort/FXX = mpif90/' make.arch
sed -i 's/WITH_MPI := 0/WITH_MPI := 1/' make.config
sed -i 's/WITH_DFTD3 := 0/WITH_DFTD3 := 1/' make.config
sed -i 's/answer = /answer = True #/' utils/get_opt_externals
./utils/get_opt_externals ALL
make
make install



________________________________
From: DFTB-Plus-User <dftb-plus-user-bounces at mailman.zfn.uni-bremen.de> on behalf of Bálint Aradi <aradi at uni-bremen.de>
Sent: Thursday, May 30, 2019 11:18:39 PM
To: dftb-plus-user at mailman.zfn.uni-bremen.de
Subject: Re: [DFTB-Plus-User] MD-DFTB jobs get stuck

Dear Natalia,

Is the last output of your job still normal, or does it contain any
error messages? Is it a parallel or a serial binary, which you use? And
finally, did you try to change the eigensolver? The implementations of
the diagonaliser (especially if you use an MPI-parallelised job) may
differ, depending on your system.

  Best regards,

  Bálint

--
Dr. Bálint Aradi
Bremen Center for Computational Materials Science, University of Bremen
http://www.bccms.uni-bremen.de/cms/people/b-aradi/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20190531/db3d2623/attachment.html>


More information about the DFTB-Plus-User mailing list