[DFTB-Plus-User] OpenMP scaling questions
Yang, Chi-Ta
yangchit at msu.edu
Thu Nov 1 17:22:45 CET 2018
Hi Dr. B. Hourahine,
Thanks a lot for the reply. I am trying what you have suggested. Can I approximate the parallel fraction (P) as 90% or 85%(SCC part) according to the table below?
Is the parallel fraction a fixed portion in the DFTB+ code? or it is related to the modeling size?
BTW, I am not sure if I am replying the thread correctly. I couldn't find instructions to reply.
--------------------------------------------------------------------------------
DFTB+ running times cpu [s] wall clock [s]
--------------------------------------------------------------------------------
Global initialisation + 0.01 ( 0.0%) 0.31 ( 0.0%)
Pre-SCC initialisation + 5622.79 ( 2.9%) 2803.80 ( 4.3%)
Sparse H0 and S build 1288.53 ( 0.7%) 326.20 ( 0.5%)
SCC + ******** ( 90.0%) 55631.11 ( 85.9%)
Diagonalisation ******** ( 72.4%) 47192.66 ( 72.9%)
Sparse to dense 2520.52 ( 1.3%) 1062.28 ( 1.6%)
Dense to sparse 2065.27 ( 1.1%) 517.98 ( 0.8%)
Density matrix creation 6352.21 ( 3.3%) 1594.20 ( 2.5%)
Post-SCC processing + 13619.18 ( 7.1%) 6343.42 ( 9.8%)
Energy-density matrix creation 464.84 ( 0.2%) 116.65 ( 0.2%)
Force calculation 9081.63 ( 4.7%) 3250.13 ( 5.0%)
Stress calculation 4516.20 ( 2.4%) 3086.64 ( 4.8%)
--------------------------------------------------------------------------------
Missing + 0.03 ( 0.0%) 0.01 ( 0.0%)
Total = ******** (100.0%) 64778.64 (100.0%)
Thanks a lot,
Chi-Ta Yang
Tue Oct 30 09:31:28 CET 2018
Hello Chi-Ta,
or a system this small, 20 cores is probably above the point where you
gain from parallelism in the eigensolver. Have you tested for lower
numbers (4, 8)?
The ideal parallel scaling should look like
https://dftbplus-recipes.readthedocs.io/en/master/parallel/amdahl.html#amdahl-s-law
but this ignores various effects like efficiency for sub-problems
fitting into various levels of memory hierarchy. This way be in part why
you are seeing the 30 core anomaly.
The asterisks in the output is a known problem with format breaking
https://github.com/dftbplus/dftbplus/issues/182
Regards
Ben
On 30/10/18 03:55, Yang, Chi-Ta wrote:
>
> Greetings,
>
>
> I am using dftbplus-18.2, and testing the "core" and "calculation
> time" upon a system as below.
>
>
> Test system details:
>
> - 809 atoms and periodic
> - Gamma point k-point sampling
>
> I was running on jobs with 20, 30, and 40 cores, but the elapsed time
> are comparable.
> 20 cores job: 13:36:16 hours
> 30 cores job: 15:14:54 hours
> 40 cores job: 12:12:32 hour
>
> The outputs show the OpenMP threads were as expected, but the 40-cores
> job didn't get the benefit as compared to 20-cores job.
>
> Could you please help why there is no scaling?
>
>
> BTW, the running time shows ***** as below. Is there a way to show
> the full digits.
> --------------------------------------------------------------------------------
> DFTB+ running times cpu [s] wall
> clock [s]
> --------------------------------------------------------------------------------
> Sparse H0 and S build 16557.11 ( 1.6%) 423.90 ( -0.2%)
> SCC + ******** ( 87.8%) ********
> (104.3%)
> Diagonalisation ******** ( 63.4%) ********
> (108.9%)
>
>
> Thanks a lot,
> Chi-Ta Yang
[https://avatars2.githubusercontent.com/u/2452321?s=400&v=4]<https://github.com/dftbplus/dftbplus/issues/182>
Printing timing information fails for long runs · Issue #182 · dftbplus/dftbplus<https://github.com/dftbplus/dftbplus/issues/182>
github.com
If a run takes longer than 28 hours (100000) seconds, stars appear in output instead of timing values.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20181101/2a1924a5/attachment.htm>
More information about the DFTB-Plus-User
mailing list