[DFTB-Plus-User] DFTB+MPI-NEGF on SLURM

Alessandro Pirrotta alessandro.pirrotta at chem.ku.dk
Thu Apr 9 11:07:14 CEST 2015


Thank you both for your emails.

I run the job using the following commands and in both cases it seems like
the same job is running in $NCPUS
giving me $NCPUS outputs overlapping on the same file.

srun -n $NCPUS -cores-per-socket=$NCPUS dftb+mpi-negf.r4732_ifort_tested
mpirun -np $NCPUS dftb+mpi-negf.r4732_ifort_tested

FYI I compiled it using the extlib from the website and the
makefile make.x86_64-linux-ifort

*Alessandro Pirrotta*
PhD student



*Faculty of Science*
Department of Chemistry &
Nano-Science Center
University of Copenhagen
Universitetsparken 5, C321
2100 Copenhagen Ø
Denmark

DIR +45 21 18 11 90
MOB +45 52 81 23 41

alessandro.pirrotta at chem.ku.dk

alessandro.pirrotta at gmail.com

www.ki.ku.dk

On 9 April 2015 at 10:56, Alessandro Pirrotta <tqn722 at alumni.ku.dk> wrote:

> Thank you both for your emails.
>
> I run the job using the following commands and in both cases it seems like
> the same job is running in $NCPUS
> giving me $NCPUS outputs overlapping on the same file.
>
> srun -n $NCPUS -cores-per-socket=$NCPUS dftb+mpi-negf.r4732_ifort_tested
> mpirun -np $NCPUS dftb+mpi-negf.r4732_ifort_tested
>
> FYI I compiled it using the extlib from the website and the
> makefile make.x86_64-linux-ifort
>
>
> *Alessandro Pirrotta*
> PhD student
>
>
>
> *Faculty of Science*
> Department of Chemistry &
> Nano-Science Center
> University of Copenhagen
> Universitetsparken 5, C321
> 2100 Copenhagen Ø
> Denmark
>
> DIR +45 21 18 11 90
> MOB +45 52 81 23 41
>
> alessandro.pirrotta at chem.ku.dk
>
> alessandro.pirrotta at gmail.com
>
> www.ki.ku.dk
>
> On 9 April 2015 at 10:05, Gabriele Penazzi <penazzi at uni-bremen.de> wrote:
>
>>  On 04/09/2015 09:07 AM, Alessandro Pirrotta wrote:
>>
>> Dear DFTB+ users,
>>
>>  I am having a problem running the DFTB+ on SLURM.
>> When I am connected to the front end of my account in my university
>> computer cluster, the executable runs correctly: I have run the test and
>> only 2 tests failed
>> (
>> ======= spinorbit/EuN =======
>> electronic_stress    element              0.000101791078878
>> Failed
>> stress               element              0.000101791078878
>> Failed
>> )
>>
>>  When I submit a job with SLURM and I execute normally ./dftb+ I get a
>> MPI error (see below).
>> If I run "mpi -n 1 dftb+" the job runs correctly over a node and a single
>> core.
>> *How do I run dftb+ over a single node, using n cores?*
>>
>> *[cut]*
>>
>> Hi Alessandro,
>>
>> when running NEGF, the parallelization is very different with respect to
>> solving the aigenvalue problem. dftb+negf is parallelized on two level: MPI
>> by distribution of energy points and possibly OMP by linking with threaded
>> blas/lapack libraries. The former is on us and it is mandatory to compile
>> supporting MPI, the latter is on the BLAS/LAPACK vendor and it may be
>> active or not depending on the way you compile it. See the README.NEGF and
>> README.PARALLEL files in the src directory.
>>
>> If you link a threaded library, then you will have to specify how many
>> OMP threads you assign per process in your job script. For example
>>
>> $ export OMP_NUM_THREADS=4
>> $ mpirun -n 1 dftb+
>>
>> would use 4 cores on 1 process (the correct specification depends on your
>> architecture, you may need or not additional flags but probably you have an
>> howto related to your facility). Therefore the answer to you question is
>> that you may want to use n threads on 1 process, or n processes on n cores,
>> or (more likely) something in the middle depending on your system.
>>
>> A note on efficiency. As on "common" test systems (tens to thousands
>> atoms) the lapack/scalack scale efficiently up to 2-4 threads, it is
>> usually convenient to reserve some cores for threading. Also, as the
>> parallelization on energy points implies solving N independent Green's
>> functions, therefore it needs to allocate N times memory where N is the
>> number of processes. For large systems it may be necessary to run a process
>> per socket, to get the maximum available memory. With the current version
>> also the Poisson is a bit more efficient if you have less processes on a
>> socket, considering these points at the end of the day I usually run with 2
>> or 4 omp threads (if I don't hit memory problems).
>>
>> Hope this helps,
>> Gabriele
>>
>>
>>
>>
>> --
>> --
>> Dr. Gabriele Penazzi
>> BCCMS - University of Bremen
>> http://www.bccms.uni-bremen.de/http://sites.google.com/site/gabrielepenazzi/
>> phone: +49 (0) 421 218 62337
>> mobile: +49 (0) 151 19650383
>>
>>
>> _______________________________________________
>> DFTB-Plus-User mailing list
>> DFTB-Plus-User at mailman.zfn.uni-bremen.de
>> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20150409/3ac94423/attachment.html>


More information about the DFTB-Plus-User mailing list