[DFTB-Plus-User] OpenMP scaling questions

Yang, Chi-Ta yangchit at msu.edu
Thu Nov 1 19:35:26 CET 2018


Hi Dr. B. Hourahine,

Thanks a lot for your replies/suggestions.

Sincerely,
Chi-Ta Yang
________________________________
From: DFTB-Plus-User <dftb-plus-user-bounces at mailman.zfn.uni-bremen.de> on behalf of dftb-plus-user-request at mailman.zfn.uni-bremen.de <dftb-plus-user-request at mailman.zfn.uni-bremen.de>
Sent: Thursday, November 1, 2018 12:30 PM
To: dftb-plus-user at mailman.zfn.uni-bremen.de
Subject: DFTB-Plus-User Digest, Vol 51, Issue 2

Send DFTB-Plus-User mailing list submissions to
        dftb-plus-user at mailman.zfn.uni-bremen.de

To subscribe or unsubscribe via the World Wide Web, visit
        https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.zfn.uni-2Dbremen.de_cgi-2Dbin_mailman_listinfo_dftb-2Dplus-2Duser&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=pwvuTiJXzQ9iIfaN24WUv7OdvcUhUhEqLGOAQIFaMDo&e=

or, via email, send a message with subject or body 'help' to
        dftb-plus-user-request at mailman.zfn.uni-bremen.de

You can reach the person managing the list at
        dftb-plus-user-owner at mailman.zfn.uni-bremen.de

When replying, please edit your Subject line so it is more specific
than "Re: Contents of DFTB-Plus-User digest..."


Today's Topics:

   1. Re: OpenMP scaling questions (Yang, Chi-Ta)
   2. Re: OpenMP scaling questions (Ben Hourahine)


----------------------------------------------------------------------

Message: 1
Date: Thu, 1 Nov 2018 16:22:45 +0000
From: "Yang, Chi-Ta" <yangchit at msu.edu>
To: "dftb-plus-user at mailman.zfn.uni-bremen.de"
        <dftb-plus-user at mailman.zfn.uni-bremen.de>
Subject: Re: [DFTB-Plus-User] OpenMP scaling questions
Message-ID:
        <BN6PR12MB150515FDA6BD35F1B6B151FACCCE0 at BN6PR12MB1505.namprd12.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"

Hi Dr. B. Hourahine,

Thanks a lot for the reply. I am trying what you have suggested. Can I approximate the parallel fraction (P) as 90% or 85%(SCC part) according to the table below?

Is the parallel fraction a fixed portion in the DFTB+ code? or it is related to the modeling size?

BTW, I am not sure if I am replying the thread correctly. I couldn't find instructions to reply.

--------------------------------------------------------------------------------
DFTB+ running times                          cpu [s]                 wall clock [s]
--------------------------------------------------------------------------------
Global initialisation                  +     0.01  (  0.0%)            0.31  (  0.0%)
Pre-SCC initialisation                 +  5622.79  (  2.9%)      2803.80  (  4.3%)
      Sparse H0 and S build            1288.53  (  0.7%)       326.20  (  0.5%)
SCC                                            + ********  ( 90.0%)        55631.11  ( 85.9%)
  Diagonalisation                        ********  ( 72.4%)            47192.66  ( 72.9%)
      Sparse to dense                     2520.52  (  1.3%)        1062.28  (  1.6%)
      Dense to sparse                     2065.27  (  1.1%)         517.98  (  0.8%)
  Density matrix creation                 6352.21  (  3.3%)      1594.20  (  2.5%)
Post-SCC processing                    + 13619.18  (  7.1%)   6343.42  (  9.8%)
  Energy-density matrix creation           464.84  (  0.2%)    116.65  (  0.2%)
  Force calculation                       9081.63  (  4.7%)            3250.13  (  5.0%)
  Stress calculation                      4516.20  (  2.4%)             3086.64  (  4.8%)
--------------------------------------------------------------------------------
Missing                                         +     0.03  (  0.0%)      0.01  (  0.0%)
Total                                             = ********  (100.0%)  64778.64  (100.0%)



Thanks a lot,

Chi-Ta Yang




Tue Oct 30 09:31:28 CET 2018

Hello Chi-Ta,

or a system this small, 20 cores is probably above the point where you
gain from parallelism in the eigensolver. Have you tested for lower
numbers (4, 8)?


The ideal parallel scaling should look like


https://urldefense.proofpoint.com/v2/url?u=https-3A__dftbplus-2Drecipes.readthedocs.io_en_master_parallel_amdahl.html-23amdahl-2Ds-2Dlaw&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=xIEcw7OtjUFzLVSnQ8Ua4Rmu-iftoekp1vDpktK_uE4&e=


but this ignores various effects like efficiency for sub-problems
fitting into various levels of memory hierarchy. This way be in part why
you are seeing the 30 core anomaly.


The asterisks in the output is a known problem with format breaking

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=


Regards


Ben





On 30/10/18 03:55, Yang, Chi-Ta wrote:
>
> Greetings,
>
>
> I am using dftbplus-18.2, and testing the "core" and "calculation
> time" upon a system as below.
>
>
> Test system details:
>
> - 809 atoms and periodic
> - Gamma point k-point sampling
>
> I was running on jobs with 20, 30, and 40 cores, but the elapsed time
> are comparable.
> 20 cores job:  13:36:16 hours
> 30 cores job:  15:14:54 hours
> 40 cores job:  12:12:32 hour
>
> The outputs show the OpenMP threads were as expected, but the 40-cores
> job didn't get the benefit as compared to 20-cores job.
>
> Could you please help why there is no scaling?
>
>
> BTW,  the running time shows ***** as below. Is there a way to show
> the full digits.
> --------------------------------------------------------------------------------
> DFTB+ running times                          cpu [s]             wall
> clock [s]
> --------------------------------------------------------------------------------
> Sparse H0 and S build              16557.11  (  1.6%)    423.90  ( -0.2%)
> SCC                                    + ********  ( 87.8%)  ********
> (104.3%)
> Diagonalisation                        ********  ( 63.4%)  ********
> (108.9%)
>
>
> Thanks a lot,
> Chi-Ta Yang

[https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_2452321-3Fs-3D400-26v-3D4&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=QVFbmVa3V_A4Mlj5lGpRLdDevjhyWVcrUJmRNPyYFUg&e=]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>

Printing timing information fails for long runs · Issue #182 · dftbplus/dftbplus<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
github.com
If a run takes longer than 28 hours (100000) seconds, stars appear in output instead of timing values.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.zfn.uni-2Dbremen.de_pipermail_dftb-2Dplus-2Duser_attachments_20181101_2a1924a5_attachment-2D0001.html&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=gOvXzla6cqnoF5E2SxPmzGUti8o8FiSMbwL93oUtiXc&e=>

------------------------------

Message: 2
Date: Thu, 1 Nov 2018 16:29:54 +0000
From: Ben Hourahine <benjamin.hourahine at strath.ac.uk>
To: dftb-plus-user at mailman.zfn.uni-bremen.de
Subject: Re: [DFTB-Plus-User] OpenMP scaling questions
Message-ID: <08e3c1eb-a66e-126e-45fb-3f4675c936a1 at strath.ac.uk>
Content-Type: text/plain; charset="windows-1252"

Hello Chi-Ta,


You successfully replied to the thread.

the percentages are the part of the total execution time, not the
parallel fraction. The parallel fraction can be measured either by
comparing the serial time against a large number of processes (as the
time is then dominated by the serial part), or through fitting the speed
up for a few different numbers of processors.

The parallel fraction is problem dependent, generally improving for
larger systems.

Regards

Ben

On 01/11/2018 16:22, Yang, Chi-Ta wrote:
> Hi Dr. B. Hourahine,
> Thanks a lot for the reply. I am trying what you have suggested. Can I
> approximate the parallel fraction (P) as 90% or 85%(SCC part)
> according tothe table below?
> Is the parallel fraction a fixed portionin the DFTB+ code? or it is
> related to the modeling size?
> BTW, I am not sure if I am replying the thread correctly. I couldn't
> find instructions to reply.
> --------------------------------------------------------------------------------
> DFTB+ running times                          cpu [s]             wall
> clock [s]
> --------------------------------------------------------------------------------
> Global initialisation                  +     0.01  (  0.0%)      0.31
> (  0.0%)
> Pre-SCC initialisation                 +  5622.79  (  2.9%)   2803.80
> (  4.3%)
>       Sparse H0 and S build            1288.53  (  0.7%)    326.20  (
> 0.5%)
> SCC                                    + ********  ( 90.0%)  55631.11
> ( 85.9%)
>   Diagonalisation                        ********  ( 72.4%)  47192.66
> ( 72.9%)
>       Sparse to dense                     2520.52  (  1.3%)   1062.28
> (  1.6%)
>       Dense to sparse                     2065.27  (  1.1%)    517.98
> (  0.8%)
>   Density matrix creation                 6352.21  (  3.3%)   1594.20
> (  2.5%)
> Post-SCC processing                    + 13619.18  (  7.1%)   6343.42
> (  9.8%)
>   Energy-density matrix creation           464.84  (  0.2%)    116.65
> (  0.2%)
>   Force calculation                       9081.63  (  4.7%)
> 3250.13  (  5.0%)
>   Stress calculation                      4516.20  (  2.4%)
> 3086.64  (  4.8%)
> --------------------------------------------------------------------------------
> Missing                                +     0.03  (  0.0%)      0.01
> (  0.0%)
> Total                                  = ********  (100.0%)  64778.64
> (100.0%)
> Thanks a lot,
> Chi-Ta Yang
> //
> //
> //
> /Tue Oct 30 09:31:28 CET 2018/
> Hello Chi-Ta,
>
> or a system this small, 20 cores is probably above the point where you
> gain from parallelism in the eigensolver. Have you tested for lower
> numbers (4, 8)?
>
>
> The ideal parallel scaling should look like
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dftbplus-2Drecipes.readthedocs.io_en_master_parallel_amdahl.html-23amdahl-2Ds-2Dlaw&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=xIEcw7OtjUFzLVSnQ8Ua4Rmu-iftoekp1vDpktK_uE4&e=
>
>
> but this ignores various effects like efficiency for sub-problems
> fitting into various levels of memory hierarchy. This way be in part why
> you are seeing the 30 core anomaly.
>
>
> The asterisks in the output is a known problem with format breaking
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=
>
>
> Regards
>
>
> Ben
>
>
>
>
>
> On 30/10/18 03:55, Yang, Chi-Ta wrote:
> >//>/Greetings, />//>//>/I am using dftbplus-18.2, and testing the "core" and "calculation />/time" upon a system as below.  />//>//>/Test system details: />//>/- 809 atoms and periodic  />/- Gamma point k-point sampling />//>/I was running on jobs with 20, 30, and 40 cores, but the elapsed time />/are comparable.  />/20 cores job:  13:36:16 hours />/30 cores job:  15:14:54 hours />/40 cores job:  12:12:32 hour />//>/The outputs show the OpenMP threads were as expected, but the 40-cores />/job didn't get the benefit as compared to 20-cores job.  />//>/Could you please help why there is no scaling? />//>//>/BTW,  the running time shows ***** as below. Is there a way to show />/the full digits.  />/--------------------------------------------------------------------------------
> />/DFTB+ running times                          cpu [s]             wall />/clock [s] />/--------------------------------------------------------------------------------
> />/Sparse H0 and S build              16557.11  (  1.6%)    423.90  (
> -0.2%) />/SCC                                    + ********  ( 87.8%)  ********  />/(104.3%) />/Diagonalisation                        ********  ( 63.4%)  ********  />/(108.9%) />//>//>/Thanks a lot, />/Chi-Ta Yang/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
>
> Printing timing information fails for long runs · Issue #182 ·
> dftbplus/dftbplus <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
> github.com
> If a run takes longer than 28 hours (100000) seconds, stars appear in
> output instead of timing values.
>
>
>
>
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.zfn.uni-2Dbremen.de_cgi-2Dbin_mailman_listinfo_dftb-2Dplus-2Duser&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=pwvuTiJXzQ9iIfaN24WUv7OdvcUhUhEqLGOAQIFaMDo&e=

--
      Dr. B. Hourahine, SUPA, Department of Physics,
    University of Strathclyde, John Anderson Building,
            107 Rottenrow, Glasgow G4 0NG, UK.
    +44 141 548 2325, benjamin.hourahine at strath.ac.uk

2013/4 THE Awards Entrepreneurial University of the Year
      2012/13 THE Awards UK University of the Year

   The University of Strathclyde is a charitable body,
        registered in Scotland, number SC015263

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.zfn.uni-2Dbremen.de_pipermail_dftb-2Dplus-2Duser_attachments_20181101_fdee120c_attachment.html&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=nZ0WRRrrHizjRb0DRPytnZhsZjuSgoo9E_xin-xrUrU&e=>

------------------------------

Subject: Digest Footer

_______________________________________________
DFTB-Plus-User mailing list
DFTB-Plus-User at mailman.zfn.uni-bremen.de
https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.zfn.uni-2Dbremen.de_cgi-2Dbin_mailman_listinfo_dftb-2Dplus-2Duser&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=pwvuTiJXzQ9iIfaN24WUv7OdvcUhUhEqLGOAQIFaMDo&e=

------------------------------

End of DFTB-Plus-User Digest, Vol 51, Issue 2
*********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20181101/4e34e6d3/attachment.htm>


More information about the DFTB-Plus-User mailing list