[DFTB-Plus-User] On parallel version of DFTB+

Fri Sep 23 09:01:47 CEST 2016

Hi,

the NEGF part is parallelized differently from the rest, the hamiltonina
is not distributed across nodes but rather the parallelization occurs
via distribution of an energy integral.

It follows that each process has to carry the whole hamiltonian. Luckily
the calculation of the Green's function can be carried on maintaining
some degree of sparsity, especially if the system is quite elongated
(nanowires, nanotubes, or periodic systems with a small supercell in the
direction orthogonal to transport). In this case you can use a smart
partitioning which reduces the memory and computational footprint. This
is achieved using the FirstLayerAtoms flag in the input (see
documentation online).

I managed to calculate systems with more than 150 000 orbitals in NEGF,
therefore your system should be within reach. I can give you some
suggestion in advance:

1) Use the FirstLayerAtoms if you can (if your system is elongated). It
can give a massive improvement.

2) Make sure that every process has enough memory. You can have better
results combining MPI and openmp. If you run in troubles with memory
consumption you can try to ask again in the list and I'll explain this
point better.

Gabriele

On 09/22/2016 09:36 PM, ZHAOHUI HUANG wrote:
> Hello,
> 
>      Sorry to bother you again.
> 
>      I have a question about the size limit of additional package
> implemented in DFTB+ on nonlinear Green Function based transport. I have
> not used it before. If I can try a structure up to 5000 atoms, I will
> use it. My current results show that those mesoscale structures could be
> hoping conductor, so I want to know how well it conducts electrons.
> thanks. comments are also welcome.
> 
> ZhaoHui Huang,
> 
> ------------------------------------------------------------------------
> *From: *"Jacek Jakowski" <jjakowski at gmail.com>
> *To: *"User list for DFTB+ related questions"
> <dftb-plus-user at mailman.zfn.uni-bremen.de>
> *Sent: *Monday, September 19, 2016 8:50:22 PM
> *Subject: *Re: [DFTB-Plus-User] On parallel version of DFTB+
> 
> If that is all you want, why don't you use lammps or some other  force
> fields code?
> ------------------------------------------------------------------------
> From: ZHAOHUI HUANG <mailto:zuh101 at psu.edu>
> Sent: ‎9/‎19/‎2016 6:00 PM
> To: User list for DFTB+ related questions
> <mailto:dftb-plus-user at mailman.zfn.uni-bremen.de>
> Subject: Re: [DFTB-Plus-User] On parallel version of DFTB+
> 
> OK. thanks for reply. Actually I just use DFTB+ to relax large
> structures with smallest basis. For better description of conduction
> band, I wrote a TB code on static structure, using sp3d5s* basis. I can
> calculate band structure for up to 12,000 atoms with CHEEV. thanks.
> 
> ZhaoHui Huang,
> 
> 
> ----- Original Message -----
> From: "Jacek Jakowski" <jjakowski at gmail.com>
> To: "User list for DFTB+ related questions"
> <dftb-plus-user at mailman.zfn.uni-bremen.de>
> Sent: Monday, September 19, 2016 4:13:18 PM
> Subject: Re: [DFTB-Plus-User] On parallel version of DFTB+
> 
> Yes, scalapack is a direct attempt  to get   eigenvalues  for dense
> matrices.  CHEEV  or CHEEVD are complex diagonalizer  for dense
> matrices. As such it scales cubically. Different algorithms for
> diagonalization are implemented and one them is called
> divide-and-conquer.   The DC-DFTB-K  treats  molecular systems as
> collection of   fragments which are  solved  independently (hence
> divide-and-conquer) and then assembles the  approximate solution.
> This may be confusing with CHEEV/CHEEVD  since both  belong to
> "divide-and-conquer"  approaches.  But besides the names  the have
> nothing in common.
> 
> I don't think that  DFTB+ could handle million of atoms without
> switching  from  dense to sparse  matrices/solvers  ---  to estimate
> how much resources you need just take the largest calculations you
> were able to do so far,  see   how much larger is the system you have
> in mind and take the cube of that number. For example, increase your
> size 10x and you cost grows 1000 times.
> 
> And yes,  there are often iterative  solvers such as ARPACK/LANCZOS,
> they are good if you  only need small fraction of eigenvectors (like
> in DFT with large basis sets).  You cannot really use them with DFTB
> because you need  all  eigenvectors.  You basis set is already reduced
> to minimum and you cannot  reduce it any further.
> 
> On Mon, Sep 19, 2016 at 3:45 PM, ZHAOHUI HUANG <zuh101 at psu.edu> wrote:
>> Hi,
>>
>>   Thanks for quick reply. yeah, I know you mean the first step of
> BerkeleyGW. When you solve eigenvalues of exciton, in the space of
> quasiparticle wave functions, you don't diagonalize exciton Hamiltonian
> containing two-body effect? Last year, I had a 114 atoms structure, the
> exciton Hamiltonian had dimension around 170,000. They used iterative
> method to diagonalize H. Can I say SCALAPACK solution is direct attempts
> to get eigenvalues? for example, CHEEV or CHEEVD (divide and qanquer
> algorithm). so is it possible to use iterative method to handle large
> size of TB Hamiltonian?  I expect DFTB+ could extend to handle structure
> with one million or so. comments?
>>
>> ZhaoHui Huang,
>>
>>
>> ----- Original Message -----
>> From: "Jacek Jakowski" <jjakowski at gmail.com>
>> To: "User list for DFTB+ related questions"
> <dftb-plus-user at mailman.zfn.uni-bremen.de>
>> Sent: Monday, September 19, 2016 3:28:14 PM
>> Subject: Re: [DFTB-Plus-User] On parallel version of DFTB+
>>
>> My estimates are  based on diagonalization for dense matrices, which
>> is what scalapack does.  The specific numbers are based on
>> diagonalization on Cray XC30 (intel Xeons, 16 cores per node).
>> Diagonalization as  well as other matrix-matrix operations scale
>> cubically with the system  size  which means  that if you decreases
>> your system  2 times  then the computational cost  reduces 8
>> times(=2^3).  Actually I need to correct my previous message:  10
>> hours on 4000 cores  for a single diagonalization is not for 100k for
>> for 400k basis functions.  Yes, tight-binding  is much faster than
>> conventional DFT, but  dense  linear algebra still scales  cubically
>> and dominates  computations. The   speedup  with respect to DFT  comes
>> from  two factors: (1)   for the given number of atoms  the matrices
>> are about  5-10 smaller than  in conventional DFT with localized basis
>>  set, (2)   the  formation  of DFTB matrices  is very small comparing
>> to the same size DFT matrices.
>>
>> According to the  official information BerkeleyGW does  not do
>> diagonalization but  take the results of diagonalization from other
>> codes  as input (and computes higher order corrections). Also, it is
>> intended for up to a few hundreds of atoms.
>>
>> Besides DFTB+, you  can try   divide and conquer implementation called
>>  DC-DFTB-K  (Japan)   or  cp2k implementation (they use ELPA if I
>> remember correctly).
>>
>>
>> Jacek
>>
>>
>> On Mon, Sep 19, 2016 at 2:01 PM, ZHAOHUI HUANG <zuh101 at psu.edu> wrote:
>>> Can you describe some algorithm details used in DFTB+？ especially on
> Hamiltonian diagonalization? Tight-binding calculations are supposed to
> run very fast, but your reply impressed me with totally different
> picture. It takes me time to think over your words. I have not realized
> that DFTB+ might require a few thousand of processors. simply ask, could
> you tell me on what most CPU time are spent with DFTB+ calculation?
> diagonalization?
>>> thanks a lot.
>>>
>>> If you use iterative method to solve Hamiltonian eigenvalues as
> implemented in BerkeleyGW, what do you think the calculation speed?
>>>
>>> ZhaoHui Huang,
>>>
>>>
>>> ----- Original Message -----
>>> From: "Jacek Jakowski" <jjakowski at gmail.com>
>>> To: "User list for DFTB+ related questions"
> <dftb-plus-user at mailman.zfn.uni-bremen.de>
>>> Sent: Saturday, September 17, 2016 8:26:15 PM
>>> Subject: Re: [DFTB-Plus-User] On parallel version of DFTB+
>>>
>>> Most likely you don't have enough memory to fit the 26,000 atoms  on
>>> your computer, even if   DFTB+ can handle it.   Assuming that  your
>>> 26k atoms are carbons (or similar) you need 80GB   to fit a single
>>> matrix (100kbasis) in memory and much more (like 10 times)  for a real
>>> calculations.
>>> But then  if this fits into  your memory, then  100k matrices on 4000
>>> cores takes about 10 hours for a  single diagonalization (real case).
>>> It would  probably  took  something like a month  to do SCF, and
>>> about half  a year for  a few  MD steps.
>>>
>>> I suggest  that you decrease  the size of cell so that your matrices
>>> are below  32,000.
>>>
>>> Jacek
>>>
>>> On Fri, Sep 9, 2016 at 1:36 PM, ZHAOHUI HUANG <zuh101 at psu.edu> wrote:
>>>> Hello,
>>>>
>>>>      Sorry to bother you if not interested.
>>>>
>>>>      I have an issue from running parallel DFTB+. My unit cell
> contains 26,000 atoms and I just want to relax the structures a few
> steps. By running the code, I first get output overflow error message,
> then I increase MAXRECL parameter defined in HSDParser package. It runs
> indeed. but It failed by SCALAPACK error,
>>>>
>>>> MAXNEIGHBORS: 8847
>>>>   iSCC Total electronic   Diff electronic      SCC error
>>>> Operation failed!
>>>> ppotrf in scalafx_ppotrf_dreal
>>>> Info: 23233
>>>>
>>>>
>>>>     Is there any code developer who is familiar with this part of
> code? thanks.
>>>>
>>>>
>>>> ZhaoHui Huang,
>>>> _______________________________________________
>>>> DFTB-Plus-User mailing list
>>>> DFTB-Plus-User at mailman.zfn.uni-bremen.de
>>>>
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
>>> _______________________________________________
>>> DFTB-Plus-User mailing list
>>> DFTB-Plus-User at mailman.zfn.uni-bremen.de
>>> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
>>> _______________________________________________
>>> DFTB-Plus-User mailing list
>>> DFTB-Plus-User at mailman.zfn.uni-bremen.de
>>> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
>> _______________________________________________
>> DFTB-Plus-User mailing list
>> DFTB-Plus-User at mailman.zfn.uni-bremen.de
>> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
>> _______________________________________________
>> DFTB-Plus-User mailing list
>> DFTB-Plus-User at mailman.zfn.uni-bremen.de
>> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
> 
> 
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user
> 

-- 
--
Dr. Gabriele Penazzi
BCCMS - University of Bremen

http://www.bccms.uni-bremen.de/
http://sites.google.com/site/gabrielepenazzi/
phone: +49 (0) 421 218 62337
mobile: +49 (0) 151 19650383