[DFTB-Plus-User] OpenMP scaling questions

Ben Hourahine benjamin.hourahine at strath.ac.uk
Thu Nov 1 17:29:54 CET 2018


Hello Chi-Ta,


You successfully replied to the thread.

the percentages are the part of the total execution time, not the
parallel fraction. The parallel fraction can be measured either by
comparing the serial time against a large number of processes (as the
time is then dominated by the serial part), or through fitting the speed
up for a few different numbers of processors.

The parallel fraction is problem dependent, generally improving for
larger systems.

Regards

Ben

On 01/11/2018 16:22, Yang, Chi-Ta wrote:
> Hi Dr. B. Hourahine,
> Thanks a lot for the reply. I am trying what you have suggested. Can I
> approximate the parallel fraction (P) as 90% or 85%(SCC part)
> according tothe table below?
> Is the parallel fraction a fixed portionin the DFTB+ code? or it is
> related to the modeling size?
> BTW, I am not sure if I am replying the thread correctly. I couldn't
> find instructions to reply.
> --------------------------------------------------------------------------------
> DFTB+ running times                          cpu [s]             wall
> clock [s]
> --------------------------------------------------------------------------------
> Global initialisation                  +     0.01  (  0.0%)      0.31 
> (  0.0%)
> Pre-SCC initialisation                 +  5622.79  (  2.9%)   2803.80 
> (  4.3%)
>       Sparse H0 and S build            1288.53  (  0.7%)    326.20  ( 
> 0.5%)
> SCC                                    + ********  ( 90.0%)  55631.11 
> ( 85.9%)
>   Diagonalisation                        ********  ( 72.4%)  47192.66 
> ( 72.9%)
>       Sparse to dense                     2520.52  (  1.3%)   1062.28 
> (  1.6%)
>       Dense to sparse                     2065.27  (  1.1%)    517.98 
> (  0.8%)
>   Density matrix creation                 6352.21  (  3.3%)   1594.20 
> (  2.5%)
> Post-SCC processing                    + 13619.18  (  7.1%)   6343.42 
> (  9.8%)
>   Energy-density matrix creation           464.84  (  0.2%)    116.65 
> (  0.2%)
>   Force calculation                       9081.63  (  4.7%)   
> 3250.13  (  5.0%)
>   Stress calculation                      4516.20  (  2.4%)   
> 3086.64  (  4.8%)
> --------------------------------------------------------------------------------
> Missing                                +     0.03  (  0.0%)      0.01 
> (  0.0%)
> Total                                  = ********  (100.0%)  64778.64 
> (100.0%)
> Thanks a lot,
> Chi-Ta Yang
> //
> //
> //
> /Tue Oct 30 09:31:28 CET 2018/
> Hello Chi-Ta,
>
> or a system this small, 20 cores is probably above the point where you
> gain from parallelism in the eigensolver. Have you tested for lower
> numbers (4, 8)?
>
>
> The ideal parallel scaling should look like
>
>
> https://dftbplus-recipes.readthedocs.io/en/master/parallel/amdahl.html#amdahl-s-law
>
>
> but this ignores various effects like efficiency for sub-problems
> fitting into various levels of memory hierarchy. This way be in part why
> you are seeing the 30 core anomaly.
>
>
> The asterisks in the output is a known problem with format breaking
>
> https://github.com/dftbplus/dftbplus/issues/182
>
>
> Regards
>
>
> Ben
>
>
>
>
>
> On 30/10/18 03:55, Yang, Chi-Ta wrote:
> >//>/Greetings, />//>//>/I am using dftbplus-18.2, and testing the "core" and "calculation />/time" upon a system as below.  />//>//>/Test system details: />//>/- 809 atoms and periodic  />/- Gamma point k-point sampling />//>/I was running on jobs with 20, 30, and 40 cores, but the elapsed time />/are comparable.  />/20 cores job:  13:36:16 hours />/30 cores job:  15:14:54 hours />/40 cores job:  12:12:32 hour />//>/The outputs show the OpenMP threads were as expected, but the 40-cores />/job didn't get the benefit as compared to 20-cores job.  />//>/Could you please help why there is no scaling? />//>//>/BTW,  the running time shows ***** as below. Is there a way to show />/the full digits.  />/--------------------------------------------------------------------------------
> />/DFTB+ running times                          cpu [s]             wall />/clock [s] />/--------------------------------------------------------------------------------
> />/Sparse H0 and S build              16557.11  (  1.6%)    423.90  (
> -0.2%) />/SCC                                    + ********  ( 87.8%)  ********  />/(104.3%) />/Diagonalisation                        ********  ( 63.4%)  ********  />/(108.9%) />//>//>/Thanks a lot, />/Chi-Ta Yang/
> <https://github.com/dftbplus/dftbplus/issues/182>
> 	
> Printing timing information fails for long runs · Issue #182 ·
> dftbplus/dftbplus <https://github.com/dftbplus/dftbplus/issues/182>
> github.com
> If a run takes longer than 28 hours (100000) seconds, stars appear in
> output instead of timing values.
>
>
>
>
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://mailman.zfn.uni-bremen.de/cgi-bin/mailman/listinfo/dftb-plus-user

-- 
      Dr. B. Hourahine, SUPA, Department of Physics,
    University of Strathclyde, John Anderson Building,
            107 Rottenrow, Glasgow G4 0NG, UK.
    +44 141 548 2325, benjamin.hourahine at strath.ac.uk

2013/4 THE Awards Entrepreneurial University of the Year
      2012/13 THE Awards UK University of the Year

   The University of Strathclyde is a charitable body,
        registered in Scotland, number SC015263

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20181101/fdee120c/attachment.htm>


More information about the DFTB-Plus-User mailing list