[DFTB-Plus-User] OpenMP scaling questions
Yang, Chi-Ta
yangchit at msu.edu
Thu Nov 1 19:35:26 CET 2018
Hi Dr. B. Hourahine,
Thanks a lot for your replies/suggestions.
Sincerely,
Chi-Ta Yang
________________________________
From: DFTB-Plus-User <dftb-plus-user-bounces at mailman.zfn.uni-bremen.de> on behalf of dftb-plus-user-request at mailman.zfn.uni-bremen.de <dftb-plus-user-request at mailman.zfn.uni-bremen.de>
Sent: Thursday, November 1, 2018 12:30 PM
To: dftb-plus-user at mailman.zfn.uni-bremen.de
Subject: DFTB-Plus-User Digest, Vol 51, Issue 2
Send DFTB-Plus-User mailing list submissions to
dftb-plus-user at mailman.zfn.uni-bremen.de
To subscribe or unsubscribe via the World Wide Web, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.zfn.uni-2Dbremen.de_cgi-2Dbin_mailman_listinfo_dftb-2Dplus-2Duser&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=pwvuTiJXzQ9iIfaN24WUv7OdvcUhUhEqLGOAQIFaMDo&e=
or, via email, send a message with subject or body 'help' to
dftb-plus-user-request at mailman.zfn.uni-bremen.de
You can reach the person managing the list at
dftb-plus-user-owner at mailman.zfn.uni-bremen.de
When replying, please edit your Subject line so it is more specific
than "Re: Contents of DFTB-Plus-User digest..."
Today's Topics:
1. Re: OpenMP scaling questions (Yang, Chi-Ta)
2. Re: OpenMP scaling questions (Ben Hourahine)
----------------------------------------------------------------------
Message: 1
Date: Thu, 1 Nov 2018 16:22:45 +0000
From: "Yang, Chi-Ta" <yangchit at msu.edu>
To: "dftb-plus-user at mailman.zfn.uni-bremen.de"
<dftb-plus-user at mailman.zfn.uni-bremen.de>
Subject: Re: [DFTB-Plus-User] OpenMP scaling questions
Message-ID:
<BN6PR12MB150515FDA6BD35F1B6B151FACCCE0 at BN6PR12MB1505.namprd12.prod.outlook.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi Dr. B. Hourahine,
Thanks a lot for the reply. I am trying what you have suggested. Can I approximate the parallel fraction (P) as 90% or 85%(SCC part) according to the table below?
Is the parallel fraction a fixed portion in the DFTB+ code? or it is related to the modeling size?
BTW, I am not sure if I am replying the thread correctly. I couldn't find instructions to reply.
--------------------------------------------------------------------------------
DFTB+ running times cpu [s] wall clock [s]
--------------------------------------------------------------------------------
Global initialisation + 0.01 ( 0.0%) 0.31 ( 0.0%)
Pre-SCC initialisation + 5622.79 ( 2.9%) 2803.80 ( 4.3%)
Sparse H0 and S build 1288.53 ( 0.7%) 326.20 ( 0.5%)
SCC + ******** ( 90.0%) 55631.11 ( 85.9%)
Diagonalisation ******** ( 72.4%) 47192.66 ( 72.9%)
Sparse to dense 2520.52 ( 1.3%) 1062.28 ( 1.6%)
Dense to sparse 2065.27 ( 1.1%) 517.98 ( 0.8%)
Density matrix creation 6352.21 ( 3.3%) 1594.20 ( 2.5%)
Post-SCC processing + 13619.18 ( 7.1%) 6343.42 ( 9.8%)
Energy-density matrix creation 464.84 ( 0.2%) 116.65 ( 0.2%)
Force calculation 9081.63 ( 4.7%) 3250.13 ( 5.0%)
Stress calculation 4516.20 ( 2.4%) 3086.64 ( 4.8%)
--------------------------------------------------------------------------------
Missing + 0.03 ( 0.0%) 0.01 ( 0.0%)
Total = ******** (100.0%) 64778.64 (100.0%)
Thanks a lot,
Chi-Ta Yang
Tue Oct 30 09:31:28 CET 2018
Hello Chi-Ta,
or a system this small, 20 cores is probably above the point where you
gain from parallelism in the eigensolver. Have you tested for lower
numbers (4, 8)?
The ideal parallel scaling should look like
https://urldefense.proofpoint.com/v2/url?u=https-3A__dftbplus-2Drecipes.readthedocs.io_en_master_parallel_amdahl.html-23amdahl-2Ds-2Dlaw&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=xIEcw7OtjUFzLVSnQ8Ua4Rmu-iftoekp1vDpktK_uE4&e=
but this ignores various effects like efficiency for sub-problems
fitting into various levels of memory hierarchy. This way be in part why
you are seeing the 30 core anomaly.
The asterisks in the output is a known problem with format breaking
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=
Regards
Ben
On 30/10/18 03:55, Yang, Chi-Ta wrote:
>
> Greetings,
>
>
> I am using dftbplus-18.2, and testing the "core" and "calculation
> time" upon a system as below.
>
>
> Test system details:
>
> - 809 atoms and periodic
> - Gamma point k-point sampling
>
> I was running on jobs with 20, 30, and 40 cores, but the elapsed time
> are comparable.
> 20 cores job: 13:36:16 hours
> 30 cores job: 15:14:54 hours
> 40 cores job: 12:12:32 hour
>
> The outputs show the OpenMP threads were as expected, but the 40-cores
> job didn't get the benefit as compared to 20-cores job.
>
> Could you please help why there is no scaling?
>
>
> BTW, the running time shows ***** as below. Is there a way to show
> the full digits.
> --------------------------------------------------------------------------------
> DFTB+ running times cpu [s] wall
> clock [s]
> --------------------------------------------------------------------------------
> Sparse H0 and S build 16557.11 ( 1.6%) 423.90 ( -0.2%)
> SCC + ******** ( 87.8%) ********
> (104.3%)
> Diagonalisation ******** ( 63.4%) ********
> (108.9%)
>
>
> Thanks a lot,
> Chi-Ta Yang
[https://urldefense.proofpoint.com/v2/url?u=https-3A__avatars2.githubusercontent.com_u_2452321-3Fs-3D400-26v-3D4&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=QVFbmVa3V_A4Mlj5lGpRLdDevjhyWVcrUJmRNPyYFUg&e=]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
Printing timing information fails for long runs · Issue #182 · dftbplus/dftbplus<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
github.com
If a run takes longer than 28 hours (100000) seconds, stars appear in output instead of timing values.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.zfn.uni-2Dbremen.de_pipermail_dftb-2Dplus-2Duser_attachments_20181101_2a1924a5_attachment-2D0001.html&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=gOvXzla6cqnoF5E2SxPmzGUti8o8FiSMbwL93oUtiXc&e=>
------------------------------
Message: 2
Date: Thu, 1 Nov 2018 16:29:54 +0000
From: Ben Hourahine <benjamin.hourahine at strath.ac.uk>
To: dftb-plus-user at mailman.zfn.uni-bremen.de
Subject: Re: [DFTB-Plus-User] OpenMP scaling questions
Message-ID: <08e3c1eb-a66e-126e-45fb-3f4675c936a1 at strath.ac.uk>
Content-Type: text/plain; charset="windows-1252"
Hello Chi-Ta,
You successfully replied to the thread.
the percentages are the part of the total execution time, not the
parallel fraction. The parallel fraction can be measured either by
comparing the serial time against a large number of processes (as the
time is then dominated by the serial part), or through fitting the speed
up for a few different numbers of processors.
The parallel fraction is problem dependent, generally improving for
larger systems.
Regards
Ben
On 01/11/2018 16:22, Yang, Chi-Ta wrote:
> Hi Dr. B. Hourahine,
> Thanks a lot for the reply. I am trying what you have suggested. Can I
> approximate the parallel fraction (P) as 90% or 85%(SCC part)
> according tothe table below?
> Is the parallel fraction a fixed portionin the DFTB+ code? or it is
> related to the modeling size?
> BTW, I am not sure if I am replying the thread correctly. I couldn't
> find instructions to reply.
> --------------------------------------------------------------------------------
> DFTB+ running times cpu [s] wall
> clock [s]
> --------------------------------------------------------------------------------
> Global initialisation + 0.01 ( 0.0%) 0.31
> ( 0.0%)
> Pre-SCC initialisation + 5622.79 ( 2.9%) 2803.80
> ( 4.3%)
> Sparse H0 and S build 1288.53 ( 0.7%) 326.20 (
> 0.5%)
> SCC + ******** ( 90.0%) 55631.11
> ( 85.9%)
> Diagonalisation ******** ( 72.4%) 47192.66
> ( 72.9%)
> Sparse to dense 2520.52 ( 1.3%) 1062.28
> ( 1.6%)
> Dense to sparse 2065.27 ( 1.1%) 517.98
> ( 0.8%)
> Density matrix creation 6352.21 ( 3.3%) 1594.20
> ( 2.5%)
> Post-SCC processing + 13619.18 ( 7.1%) 6343.42
> ( 9.8%)
> Energy-density matrix creation 464.84 ( 0.2%) 116.65
> ( 0.2%)
> Force calculation 9081.63 ( 4.7%)
> 3250.13 ( 5.0%)
> Stress calculation 4516.20 ( 2.4%)
> 3086.64 ( 4.8%)
> --------------------------------------------------------------------------------
> Missing + 0.03 ( 0.0%) 0.01
> ( 0.0%)
> Total = ******** (100.0%) 64778.64
> (100.0%)
> Thanks a lot,
> Chi-Ta Yang
> //
> //
> //
> /Tue Oct 30 09:31:28 CET 2018/
> Hello Chi-Ta,
>
> or a system this small, 20 cores is probably above the point where you
> gain from parallelism in the eigensolver. Have you tested for lower
> numbers (4, 8)?
>
>
> The ideal parallel scaling should look like
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dftbplus-2Drecipes.readthedocs.io_en_master_parallel_amdahl.html-23amdahl-2Ds-2Dlaw&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=xIEcw7OtjUFzLVSnQ8Ua4Rmu-iftoekp1vDpktK_uE4&e=
>
>
> but this ignores various effects like efficiency for sub-problems
> fitting into various levels of memory hierarchy. This way be in part why
> you are seeing the 30 core anomaly.
>
>
> The asterisks in the output is a known problem with format breaking
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=
>
>
> Regards
>
>
> Ben
>
>
>
>
>
> On 30/10/18 03:55, Yang, Chi-Ta wrote:
> >//>/Greetings, />//>//>/I am using dftbplus-18.2, and testing the "core" and "calculation />/time" upon a system as below. />//>//>/Test system details: />//>/- 809 atoms and periodic />/- Gamma point k-point sampling />//>/I was running on jobs with 20, 30, and 40 cores, but the elapsed time />/are comparable. />/20 cores job: 13:36:16 hours />/30 cores job: 15:14:54 hours />/40 cores job: 12:12:32 hour />//>/The outputs show the OpenMP threads were as expected, but the 40-cores />/job didn't get the benefit as compared to 20-cores job. />//>/Could you please help why there is no scaling? />//>//>/BTW, the running time shows ***** as below. Is there a way to show />/the full digits. />/--------------------------------------------------------------------------------
> />/DFTB+ running times cpu [s] wall />/clock [s] />/--------------------------------------------------------------------------------
> />/Sparse H0 and S build 16557.11 ( 1.6%) 423.90 (
> -0.2%) />/SCC + ******** ( 87.8%) ******** />/(104.3%) />/Diagonalisation ******** ( 63.4%) ******** />/(108.9%) />//>//>/Thanks a lot, />/Chi-Ta Yang/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
>
> Printing timing information fails for long runs · Issue #182 ·
> dftbplus/dftbplus <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dftbplus_dftbplus_issues_182&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=Kd8qy4cUD1gxDek5eP7iXcg0ZTTKz7jFrZJqrbYqAGs&e=>
> github.com
> If a run takes longer than 28 hours (100000) seconds, stars appear in
> output instead of timing values.
>
>
>
>
> _______________________________________________
> DFTB-Plus-User mailing list
> DFTB-Plus-User at mailman.zfn.uni-bremen.de
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.zfn.uni-2Dbremen.de_cgi-2Dbin_mailman_listinfo_dftb-2Dplus-2Duser&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=pwvuTiJXzQ9iIfaN24WUv7OdvcUhUhEqLGOAQIFaMDo&e=
--
Dr. B. Hourahine, SUPA, Department of Physics,
University of Strathclyde, John Anderson Building,
107 Rottenrow, Glasgow G4 0NG, UK.
+44 141 548 2325, benjamin.hourahine at strath.ac.uk
2013/4 THE Awards Entrepreneurial University of the Year
2012/13 THE Awards UK University of the Year
The University of Strathclyde is a charitable body,
registered in Scotland, number SC015263
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.zfn.uni-2Dbremen.de_pipermail_dftb-2Dplus-2Duser_attachments_20181101_fdee120c_attachment.html&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=nZ0WRRrrHizjRb0DRPytnZhsZjuSgoo9E_xin-xrUrU&e=>
------------------------------
Subject: Digest Footer
_______________________________________________
DFTB-Plus-User mailing list
DFTB-Plus-User at mailman.zfn.uni-bremen.de
https://urldefense.proofpoint.com/v2/url?u=https-3A__mailman.zfn.uni-2Dbremen.de_cgi-2Dbin_mailman_listinfo_dftb-2Dplus-2Duser&d=DwIGaQ&c=nE__W8dFE-shTxStwXtp0A&r=k9TfHS8rPor0FvO5_DcdbA&m=bY8qd363xlJr34vtHIZ1bOy1kxKRsFfcfrcT7YENFBM&s=pwvuTiJXzQ9iIfaN24WUv7OdvcUhUhEqLGOAQIFaMDo&e=
------------------------------
End of DFTB-Plus-User Digest, Vol 51, Issue 2
*********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zfn.uni-bremen.de/pipermail/dftb-plus-user/attachments/20181101/4e34e6d3/attachment.htm>
More information about the DFTB-Plus-User
mailing list