LNCS 7851 - High Performance CPU Kernels for Multiphase Compressibl...
... , 2013. c© Springer-Verlag Berlin Heidelberg 2013 High Performance CPU Kernels for Multiphase ... the conditional branches with conditional moves in the HLLE fluxes. xz y AoS AoSSoA SoA 1 2 3 Fig. 1 ... , 2013. c© Springer-Verlag Berlin Heidelberg 2013 High Performance CPU Kernels for Multiphase ... the conditional branches with conditional moves in the HLLE fluxes. xz y AoS AoSSoA SoA 1 2 3 Fig. 1 ... LNCS 7851 - High Performance CPU Kernels for Multiphase Compressible Flows ...
Slide 1
... @spcl_eth 26 C to DaCe: SpMV 1. AST Transformations 2. Translation from C to SDFG 3. Dataflow coarsening j ... DaCe: SpMV 1. AST Transformations 2. Translation from C to SDFG 3. Dataflow coarsening j = row_ptr[i] i ... @spcl_eth 26 C to DaCe: SpMV 1. AST Transformations 2. Translation from C to SDFG 3. Dataflow coarsening j ... DaCe: SpMV 1. AST Transformations 2. Translation from C to SDFG 3. Dataflow coarsening j = row_ptr[i] i ... Slide 1 ...
Slide 1
... Production Reality 1. F1 2. F3 3. F2 spcl.inf.ethz.ch @spcl_eth 11 How to mechanize the expert? → Survey! C o ... @spcl_eth 17 I = {0 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 } J = {0,1,2} n = 5 ✔✔ Sweep3D ✖ MILC ✔ HOMME ... Production Reality 1. F1 2. F3 3. F2 spcl.inf.ethz.ch @spcl_eth 11 How to mechanize the expert? → Survey! C o ... @spcl_eth 17 I = {0 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 } J = {0,1,2} n = 5 ✔✔ Sweep3D ✖ MILC ✔ HOMME ... Slide 1 ...
Slide 1
... Space 0 1 2 3 4 5 j i 5 4 3 2 1 0 N = 4 j ≤ i i ≤ N = 4 0 ≤ j 0 ≤ i D = { (i, j) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ i ... & ThreadsIteration Space 𝐵𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 4 % 2, 𝑗 3 % 2 } 0 1 10 i j 𝑇𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 % 4, 𝑗 % 3 ... Space 0 1 2 3 4 5 j i 5 4 3 2 1 0 N = 4 j ≤ i i ≤ N = 4 0 ≤ j 0 ≤ i D = { (i, j) | 0 ≤ i ≤ N ∧ 0 ≤ j ≤ i ... & ThreadsIteration Space 𝐵𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 4 % 2, 𝑗 3 % 2 } 0 1 10 i j 𝑇𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 % 4, 𝑗 % 3 ... Slide 1 ...
Slide 1
... ] += A[i+k*N] * B[k+ j*N]; 1 1 3 1 1 4 1 7 9 4 1 2 1 5 1 3 1 3 0 1 3 7 4 1 3 0 9 8 1 2 5 6 5 … 7/21 ... execution time? very hard! for(int i=0; ij=0; jj) for(int k=0; kC[i+ j*N ... ] += A[i+k*N] * B[k+ j*N]; 1 1 3 1 1 4 1 7 9 4 1 2 1 5 1 3 1 3 0 1 3 7 4 1 3 0 9 8 1 2 5 6 5 … 7/21 ... execution time? very hard! for(int i=0; ij=0; jj) for(int k=0; kC[i+ j*N ... Slide 1 ...
Slide 1
... ; jj) for(int k=0; kC[i+ j*N] += A[i+k*N] * B[k+ j*N]; 1 1 3 1 1 4 1 7 9 4 1 2 1 5 1 3 1 ... six steps: 1) Identify input parameters that influence runtime 2) Identify application kernels 3 ... ; jj) for(int k=0; kC[i+ j*N] += A[i+k*N] * B[k+ j*N]; 1 1 3 1 1 4 1 7 9 4 1 2 1 5 1 3 1 ... six steps: 1) Identify input parameters that influence runtime 2) Identify application kernels 3 ... Slide 1 ...
Slide 1
... spcl.inf.ethz.ch @spcl_eth 4 Scientific Performance Engineering 1) Observe 2) Model 3) Understand 4) Build ... )(log)( n Î ik Î I jk Î J I, J Ì n = 1 I = 0,1, 2{ } J = {0,1} c1 c1 × p c1 × p 2 c1 × log(p) c1 × p ... spcl.inf.ethz.ch @spcl_eth 4 Scientific Performance Engineering 1) Observe 2) Model 3) Understand 4) Build ... )(log)( n Î ik Î I jk Î J I, J Ì n = 1 I = 0,1, 2{ } J = {0,1} c1 c1 × p c1 × p 2 c1 × log(p) c1 × p ... Slide 1 ...
Slide 1
... we get? spcl.inf.ethz.ch @spcl_eth Tool: Polyhedral Modeling Iteration Space 0 1 2 3 4 5 j i 5 4 3 2 ... 𝐵𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 4 % 2, 𝑗 3 % 2 } 0 1 10 i j 𝑇𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 % 4, 𝑗 % 3 } spcl.inf.ethz.ch ... we get? spcl.inf.ethz.ch @spcl_eth Tool: Polyhedral Modeling Iteration Space 0 1 2 3 4 5 j i 5 4 3 2 ... 𝐵𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 4 % 2, 𝑗 3 % 2 } 0 1 10 i j 𝑇𝐼𝐷 = { 𝑖, 𝑗 → 𝑖 % 4, 𝑗 % 3 } spcl.inf.ethz.ch ... Slide 1 ...
Slide 1
... Production Reality 1. F1 2. F3 3. F2 spcl.inf.ethz.ch @spcl_eth 11 How to mechanize the expert? → Survey! C o ... 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 } J = {0,1,2} n = 5 ✔ ✔ Sweep3D ✖ MILC ✔ HOMME ✔ XNS ... Production Reality 1. F1 2. F3 3. F2 spcl.inf.ethz.ch @spcl_eth 11 How to mechanize the expert? → Survey! C o ... 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 } J = {0,1,2} n = 5 ✔ ✔ Sweep3D ✖ MILC ✔ HOMME ✔ XNS ... Slide 1 ...
Slide 1
... = {0 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 } J = {0,1,2} n = 5 ✔✔ Sweep3D ✖ MILC ✔ HOMME ✔ XNS ... () } Instrumentation Performance measurements (profiles) Input Output 1. foo 2. compute 3. main 4. bar […] Ranking: 1 ... = {0 2 , 1 2 , 2 2 , 3 2 , 4 2 , 5 2 , 6 2 } J = {0,1,2} n = 5 ✔✔ Sweep3D ✖ MILC ✔ HOMME ✔ XNS ... () } Instrumentation Performance measurements (profiles) Input Output 1. foo 2. compute 3. main 4. bar […] Ranking: 1 ... Slide 1 ...