[DFTB-Plus-User] OpenMP scaling questions

Yang, Chi-Ta yangchit at msu.edu
Thu Nov 1 17:22:45 CET 2018


Hi Dr. B. Hourahine,

Thanks a lot for the reply. I am trying what you have suggested. Can I approximate the parallel fraction (P) as 90% or 85%(SCC part) according to the table below?

Is the parallel fraction a fixed portion in the DFTB+ code? or it is related to the modeling size?

BTW, I am not sure if I am replying the thread correctly. I couldn't find instructions to reply.

--------------------------------------------------------------------------------
DFTB+ running times                          cpu [s]                 wall clock [s]
--------------------------------------------------------------------------------
Global initialisation                  +     0.01  (  0.0%)            0.31  (  0.0%)
Pre-SCC initialisation                 +  5622.79  (  2.9%)      2803.80  (  4.3%)
      Sparse H0 and S build            1288.53  (  0.7%)       326.20  (  0.5%)
SCC                                            + ********  ( 90.0%)        55631.11  ( 85.9%)
  Diagonalisation                        ********  ( 72.4%)            47192.66  ( 72.9%)
      Sparse to dense                     2520.52  (  1.3%)        1062.28  (  1.6%)
      Dense to sparse                     2065.27  (  1.1%)         517.98  (  0.8%)
  Density matrix creation                 6352.21  (  3.3%)      1594.20  (  2.5%)
Post-SCC processing                    + 13619.18  (  7.1%)   6343.42  (  9.8%)
  Energy-density matrix creation           464.84  (  0.2%)    116.65  (  0.2%)
  Force calculation                       9081.63  (  4.7%)            3250.13  (  5.0%)
  Stress calculation                      4516.20  (  2.4%)             3086.64  (  4.8%)
--------------------------------------------------------------------------------
Missing                                         +     0.03  (  0.0%)      0.01  (  0.0%)
Total                                             = ********  (100.0%)  64778.64  (100.0%)



Thanks a lot,

Chi-Ta Yang




Tue Oct 30 09:31:28 CET 2018

Hello Chi-Ta,

or a system this small, 20 cores is probably above the point where you
gain from parallelism in the eigensolver. Have you tested for lower
numbers (4, 8)?


The ideal parallel scaling should look like


https://dftbplus-recipes.readthedocs.io/en/master/parallel/amdahl.html#amdahl-s-law


but this ignores various effects like efficiency for sub-problems
fitting into various levels of memory hierarchy. This way be in part why
you are seeing the 30 core anomaly.


The asterisks in the output is a known problem with format breaking

https://github.com/dftbplus/dftbplus/issues/182


Regards


Ben





On 30/10/18 03:55, Yang, Chi-Ta wrote:
>
> Greetings,
>
>
> I am using dftbplus-18.2, and testing the "core" and "calculation
> time" upon a system as below.
>
>
> Test system details:
>
> - 809 atoms and periodic
> - Gamma point k-point sampling
>
> I was running on jobs with 20, 30, and 40 cores, but the elapsed time
> are comparable.
> 20 cores job:  13:36:16 hours
> 30 cores job:  15:14:54 hours
> 40 cores job:  12:12:32 hour
>
> The outputs show the OpenMP threads were as expected, but the 40-cores
> job didn't get the benefit as compared to 20-cores job.
>
> Could you please help why there is no scaling?
>
>
> BTW,  the running time shows ***** as below. Is there a way to show
> the full digits.
> --------------------------------------------------------------------------------
> DFTB+ running times                          cpu [s]             wall
> clock [s]
> --------------------------------------------------------------------------------
> Sparse H0 and S build              16557.11  (  1.6%)    423.90  ( -0.2%)
> SCC                                    + ********  ( 87.8%)  ********
> (104.3%)
> Diagonalisation                        ********  ( 63.4%)  ********
> (108.9%)
>
>
> Thanks a lot,
> Chi-Ta Yang

[https://avatars2.githubusercontent.com/u/2452321?s=400&v=4]<https://github.com/dftbplus/dftbplus/issues/182>

Printing timing information fails for long runs · Issue #182 · dftbplus/dftbplus<https://github.com/dftbplus/dftbplus/issues/182>
github.com
If a run takes longer than 28 hours (100000) seconds, stars appear in output instead of timing values.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20181101/2a1924a5/attachment.html>


More information about the DFTB-Plus-User mailing list