Slower rendering with OpenMP and Python?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Slower rendering with OpenMP and Python?

efahl

Seeing Felix’s thread on attempts to speed up his contour filtering, I wanted to try this myself.  So, Windows 10, VTK 8.1.2, MSVC 2017, Python 3.7, installed MS MPI tools and SDK v 10.0.

 

I added these to my configuration script:

 

        -D VTK_SMP_IMPLEMENTATION_TYPE:STRING=OpenMP                 \

        -D Module_vtkFiltersParallelFlowPaths:BOOL=ON                \

        -D Module_vtkFiltersParallelGeometry:BOOL=ON                 \

        -D Module_vtkFiltersParallelStatistics:BOOL=ON               \

        -D Module_vtkFiltersParallelVerdict:BOOL=ON                  \

        -D Module_vtkParallelMPI4Py:BOOL=ON                          \

        -D Module_vtkRenderingParallel:BOOL=ON                       \

        -D Module_vtkRenderingParallelLIC:BOOL=ON                    \

 

The test is an animation sequence of the lower spine in a flexion-extension event with a dozen bones, most set to 50% transparency and texture mapped, various other synthetic geometry like cylinders and toroids, pretty simple stuff (no images of any sort, so no pixels or voxels).  Animation does reconfigure some rubber-banding objects (spinal discs and some cables from the test rig), but mostly the geometry remains as-is and the rendering at new xforms is the overwhelming time sink.

 

The SMP=Sequential baseline runs on Intel i7-7700k with HD 630 integrated graphics at 35 FPS.  With all the above parallelism enabled, it’s at about half that, best case maybe 18 FPS.  Notably, the CPU usage is 12% (typical one thread on an 8-thread CPU) with Sequential, but pegs 80-90% with OpenMP enabled.

 

When we test on a machine with a GPU, GTX 1050 Ti, we see the baseline for SMP=Sequential is about 41 FPS, but with OpenMP it’s 18 FPS here, too.  CPU usage is same as above, one-thread max with Sequential and “use it all” with OpenMP.

 

Is this expected for this mix of mostly static geometry?  Or am I setting the wrong configuration variables?  Or am I just doing something dumb?

 

Thanks,

Eric


_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ

Search the list archives at: http://markmail.org/search/?q=vtkusers

Follow this link to subscribe/unsubscribe:
https://vtk.org/mailman/listinfo/vtkusers
Reply | Threaded
Open this post in threaded view
|

Re: Slower rendering with OpenMP and Python?

Andras Lasso

We experienced terrible performance on desktop systems with strong Nvidia GPUs when we switched to VTK OpenGL2 rendering backend. Apparently, Nvidia’s threaded optimization off-loaded some work on the CPU and that interfered with VTK’s multithreading. Maybe OpenMP backend has the same issue. Switching to SMP backend to TBB solved the issue for us. See more details in this pull request: https://github.com/Slicer/Slicer/pull/930

 

Andras

 

From: vtkusers <[hidden email]> On Behalf Of Fahlgren, Eric
Sent: Tuesday, January 15, 2019 6:53 PM
To: [hidden email]
Subject: [vtkusers] Slower rendering with OpenMP and Python?

 

Seeing Felix’s thread on attempts to speed up his contour filtering, I wanted to try this myself.  So, Windows 10, VTK 8.1.2, MSVC 2017, Python 3.7, installed MS MPI tools and SDK v 10.0.

 

I added these to my configuration script:

 

        -D VTK_SMP_IMPLEMENTATION_TYPE:STRING=OpenMP                 \

        -D Module_vtkFiltersParallelFlowPaths:BOOL=ON                \

        -D Module_vtkFiltersParallelGeometry:BOOL=ON                 \

        -D Module_vtkFiltersParallelStatistics:BOOL=ON               \

        -D Module_vtkFiltersParallelVerdict:BOOL=ON                  \

        -D Module_vtkParallelMPI4Py:BOOL=ON                          \

        -D Module_vtkRenderingParallel:BOOL=ON                       \

        -D Module_vtkRenderingParallelLIC:BOOL=ON                    \

 

The test is an animation sequence of the lower spine in a flexion-extension event with a dozen bones, most set to 50% transparency and texture mapped, various other synthetic geometry like cylinders and toroids, pretty simple stuff (no images of any sort, so no pixels or voxels).  Animation does reconfigure some rubber-banding objects (spinal discs and some cables from the test rig), but mostly the geometry remains as-is and the rendering at new xforms is the overwhelming time sink.

 

The SMP=Sequential baseline runs on Intel i7-7700k with HD 630 integrated graphics at 35 FPS.  With all the above parallelism enabled, it’s at about half that, best case maybe 18 FPS.  Notably, the CPU usage is 12% (typical one thread on an 8-thread CPU) with Sequential, but pegs 80-90% with OpenMP enabled.

 

When we test on a machine with a GPU, GTX 1050 Ti, we see the baseline for SMP=Sequential is about 41 FPS, but with OpenMP it’s 18 FPS here, too.  CPU usage is same as above, one-thread max with Sequential and “use it all” with OpenMP.

 

Is this expected for this mix of mostly static geometry?  Or am I setting the wrong configuration variables?  Or am I just doing something dumb?

 

Thanks,

Eric


_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ

Search the list archives at: http://markmail.org/search/?q=vtkusers

Follow this link to subscribe/unsubscribe:
https://vtk.org/mailman/listinfo/vtkusers
Reply | Threaded
Open this post in threaded view
|

Re: Slower rendering with OpenMP and Python?

efahl

Thanks, Andras, I’ll try out TBB again.  I had tried to use it in preference to OpenMP initially, but had trouble building it and bailed out.

 

From: Andras Lasso <[hidden email]>
Sent: Tuesday, January 15, 2019 18:25
To: Fahlgren, Eric <[hidden email]>; [hidden email]
Subject: RE: Slower rendering with OpenMP and Python?

 

CAUTION: External email

REMINDER: Do not click links or open attachments unless you know the sender & are expecting the email.


We experienced terrible performance on desktop systems with strong Nvidia GPUs when we switched to VTK OpenGL2 rendering backend. Apparently, Nvidia’s threaded optimization off-loaded some work on the CPU and that interfered with VTK’s multithreading. Maybe OpenMP backend has the same issue. Switching to SMP backend to TBB solved the issue for us. See more details in this pull request: https://github.com/Slicer/Slicer/pull/930

 

Andras

 

From: vtkusers <[hidden email]> On Behalf Of Fahlgren, Eric
Sent: Tuesday, January 15, 2019 6:53 PM
To: [hidden email]
Subject: [vtkusers] Slower rendering with OpenMP and Python?

 

Seeing Felix’s thread on attempts to speed up his contour filtering, I wanted to try this myself.  So, Windows 10, VTK 8.1.2, MSVC 2017, Python 3.7, installed MS MPI tools and SDK v 10.0.

 

I added these to my configuration script:

 

        -D VTK_SMP_IMPLEMENTATION_TYPE:STRING=OpenMP                 \

        -D Module_vtkFiltersParallelFlowPaths:BOOL=ON                \

        -D Module_vtkFiltersParallelGeometry:BOOL=ON                 \

        -D Module_vtkFiltersParallelStatistics:BOOL=ON               \

        -D Module_vtkFiltersParallelVerdict:BOOL=ON                  \

        -D Module_vtkParallelMPI4Py:BOOL=ON                          \

        -D Module_vtkRenderingParallel:BOOL=ON                       \

        -D Module_vtkRenderingParallelLIC:BOOL=ON                    \

 

The test is an animation sequence of the lower spine in a flexion-extension event with a dozen bones, most set to 50% transparency and texture mapped, various other synthetic geometry like cylinders and toroids, pretty simple stuff (no images of any sort, so no pixels or voxels).  Animation does reconfigure some rubber-banding objects (spinal discs and some cables from the test rig), but mostly the geometry remains as-is and the rendering at new xforms is the overwhelming time sink.

 

The SMP=Sequential baseline runs on Intel i7-7700k with HD 630 integrated graphics at 35 FPS.  With all the above parallelism enabled, it’s at about half that, best case maybe 18 FPS.  Notably, the CPU usage is 12% (typical one thread on an 8-thread CPU) with Sequential, but pegs 80-90% with OpenMP enabled.

 

When we test on a machine with a GPU, GTX 1050 Ti, we see the baseline for SMP=Sequential is about 41 FPS, but with OpenMP it’s 18 FPS here, too.  CPU usage is same as above, one-thread max with Sequential and “use it all” with OpenMP.

 

Is this expected for this mix of mostly static geometry?  Or am I setting the wrong configuration variables?  Or am I just doing something dumb?

 

Thanks,

Eric


_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the VTK FAQ at: http://www.vtk.org/Wiki/VTK_FAQ

Search the list archives at: http://markmail.org/search/?q=vtkusers

Follow this link to subscribe/unsubscribe:
https://vtk.org/mailman/listinfo/vtkusers