[DFTB-Plus-User] DFTB+MPI-NEGF on SLURM
Argo
argo.nurbawono at gmail.com
Thu Apr 9 09:46:36 CEST 2015
Hi. Maybe for SLURM, if you want say 4 cores in 1 node, in your job
script:
srun --n4 -cores-per-socket=4
mpirun -np 4 dftb+
If you compile the openmp version of dftb+ (which is the default version
dftb anyway), add omp_num_threads
srun --n4 -cores-per-socket=4
export OMP_NUM_THREADS=4
dftb+
For some who use lsf instead of SLURM, this is in the lsf jobscript
#BSUB -R "span[ptile=4]"
would do the same thing as far as I know.
Argo.
On Thu, 2015-04-09 at 09:07 +0200, Alessandro Pirrotta wrote:
> Dear DFTB+ users,
>
>
>
> I am having a problem running the DFTB+ on SLURM.
> When I am connected to the front end of my account in my university
> computer cluster, the executable runs correctly: I have run the test
> and only 2 tests failed
> (
> ======= spinorbit/EuN =======
> electronic_stress element 0.000101791078878
> Failed
> stress element 0.000101791078878
> Failed
> )
>
>
> When I submit a job with SLURM and I execute normally ./dftb+ I get a
> MPI error (see below).
> If I run "mpi -n 1 dftb+" the job runs correctly over a node and a
> single core.
> How do I run dftb+ over a single node, using n cores?
>
>
> [bhc0141:20956] [[64086,1],0][grpcomm_pmi_module.c:398:modex]
> PMI_KVS_Commit failed: Operation failed
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process
> is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>
> orte_grpcomm_modex failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> [bhc0141:20956] *** An error occurred in MPI_Init
> [bhc0141:20956] *** on a NULL communicator
> [bhc0141:20956] *** Unknown error
> [bhc0141:20956] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly. You should
> double check that everything has shut down cleanly.
>
>
> Reason: Before MPI_INIT completed
> Local host: bhc0141
> PID: 20956
> --------------------------------------------------------------------------
>
>
> Kind regards,
> Alessandro
>
>
> Alessandro Pirrotta
> PhD student
>
>
>
> Faculty of Science
> Department of Chemistry &
> Nano-Science Center
> University of Copenhagen
> Universitetsparken 5, C321
> 2100 Copenhagen Ø
> Denmark
>
> DIR +45 21 18 11 90
> MOB +45 52 81 23 41
>
>
>
> alessandro.pirrotta at chem.ku.dk
>
> alessandro.pirrotta at gmail.com
>
>
> www.ki.ku.dk
>
>
>
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20150409/f1c9a40d/attachment.htm>
More information about the DFTB-Plus-User
mailing list