• Bumblebee Nvidia GPU is slower than Intel IGPU?


    Is that even possible, even though the Nvidia card is supposed to be the better one? Using Mesa with Intel vs the Nvidia binary driver

    [[email protected] ~]$ glmark2
    =======================================================
        glmark2 2017.07
    =======================================================
        OpenGL Information
        GL_VENDOR:     Intel Open Source Technology Center
        GL_RENDERER:   Mesa DRI Intel(R) Sandybridge Mobile 
        GL_VERSION:    3.0 Mesa 17.1.6
    =======================================================
    [build] use-vbo=false: FPS: 1859 FrameTime: 0.538 ms
    [build] use-vbo=true: FPS: 1939 FrameTime: 0.516 ms
    [texture] texture-filter=nearest: FPS: 1757 FrameTime: 0.569 ms
    [texture] texture-filter=linear: FPS: 1755 FrameTime: 0.570 ms
    [texture] texture-filter=mipmap: FPS: 1806 FrameTime: 0.554 ms
    [shading] shading=gouraud: FPS: 1669 FrameTime: 0.599 ms
    [shading] shading=blinn-phong-inf: FPS: 1663 FrameTime: 0.601 ms
    [shading] shading=phong: FPS: 1518 FrameTime: 0.659 ms
    [shading] shading=cel: FPS: 1450 FrameTime: 0.690 ms
    [bump] bump-render=high-poly: FPS: 973 FrameTime: 1.028 ms
    [bump] bump-render=normals: FPS: 1897 FrameTime: 0.527 ms
    [bump] bump-render=height: FPS: 1772 FrameTime: 0.564 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 939 FrameTime: 1.065 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 422 FrameTime: 2.370 ms
    [pulsar] light=false:quads=5:texture=false: FPS: 1639 FrameTime: 0.610 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 448 FrameTime: 2.232 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [desktop] effect=shadow:windows=4: FPS: 819 FrameTime: 1.221 ms
    [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 774 FrameTime: 1.292 ms
    [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 858 FrameTime: 1.166 ms
    [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 809 FrameTime: 1.236 ms
    [ideas] speed=duration: FPS: 1080 FrameTime: 0.926 ms
    [jellyfish] <default>: FPS: 976 FrameTime: 1.025 ms
    [terrain] <default>: FPS: 113 FrameTime: 8.850 ms
    [shadow] <default>: FPS: 722 FrameTime: 1.385 ms
    [refract] <default>: FPS: 258 FrameTime: 3.876 ms
    [conditionals] fragment-steps=0:vertex-steps=0: FPS: 1650 FrameTime: 0.606 ms
    [conditionals] fragment-steps=5:vertex-steps=0: FPS: 1613 FrameTime: 0.620 ms
    [conditionals] fragment-steps=0:vertex-steps=5: FPS: 1648 FrameTime: 0.607 ms
    [function] fragment-complexity=low:fragment-steps=5: FPS: 1644 FrameTime: 0.608 ms
    [function] fragment-complexity=medium:fragment-steps=5: FPS: 1630 FrameTime: 0.613 ms
    [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 1650 FrameTime: 0.606 ms
    [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 1644 FrameTime: 0.608 ms
    [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 1611 FrameTime: 0.621 ms
    =======================================================
                                      glmark2 Score: 1303 
    =======================================================
    [[email protected] ~]$ optirun glmark2
    =======================================================
        glmark2 2017.07
    =======================================================
        OpenGL Information
        GL_VENDOR:     NVIDIA Corporation
        GL_RENDERER:   NVS 4200M/PCIe/SSE2
        GL_VERSION:    4.5.0 NVIDIA 384.59
    =======================================================
    [build] use-vbo=false: FPS: 413 FrameTime: 2.421 ms
    [build] use-vbo=true: FPS: 465 FrameTime: 2.151 ms
    [texture] texture-filter=nearest: FPS: 456 FrameTime: 2.193 ms
    [texture] texture-filter=linear: FPS: 454 FrameTime: 2.203 ms
    [texture] texture-filter=mipmap: FPS: 459 FrameTime: 2.179 ms
    [shading] shading=gouraud: FPS: 435 FrameTime: 2.299 ms
    [shading] shading=blinn-phong-inf: FPS: 437 FrameTime: 2.288 ms
    [shading] shading=phong: FPS: 422 FrameTime: 2.370 ms
    [shading] shading=cel: FPS: 422 FrameTime: 2.370 ms
    [bump] bump-render=high-poly: FPS: 353 FrameTime: 2.833 ms
    [bump] bump-render=normals: FPS: 467 FrameTime: 2.141 ms
    [bump] bump-render=height: FPS: 463 FrameTime: 2.160 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 412 FrameTime: 2.427 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 299 FrameTime: 3.344 ms
    [pulsar] light=false:quads=5:texture=false: FPS: 451 FrameTime: 2.217 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 230 FrameTime: 4.348 ms
    libpng warning: iCCP: known incorrect sRGB profile
    [desktop] effect=shadow:windows=4: FPS: 265 FrameTime: 3.774 ms
    [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 298 FrameTime: 3.356 ms
    [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 296 FrameTime: 3.378 ms
    [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 304 FrameTime: 3.289 ms
    [ideas] speed=duration: FPS: 327 FrameTime: 3.058 ms
    [jellyfish] <default>: FPS: 341 FrameTime: 2.933 ms
    [terrain] <default>: FPS: 75 FrameTime: 13.333 ms
    [shadow] <default>: FPS: 318 FrameTime: 3.145 ms
    [refract] <default>: FPS: 140 FrameTime: 7.143 ms
    [conditionals] fragment-steps=0:vertex-steps=0: FPS: 450 FrameTime: 2.222 ms
    [conditionals] fragment-steps=5:vertex-steps=0: FPS: 407 FrameTime: 2.457 ms
    [conditionals] fragment-steps=0:vertex-steps=5: FPS: 453 FrameTime: 2.208 ms
    [function] fragment-complexity=low:fragment-steps=5: FPS: 449 FrameTime: 2.227 ms
    [function] fragment-complexity=medium:fragment-steps=5: FPS: 425 FrameTime: 2.353 ms
    [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 451 FrameTime: 2.217 ms
    [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 450 FrameTime: 2.222 ms
    [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 423 FrameTime: 2.364 ms
    =======================================================
                                      glmark2 Score: 379 
    =======================================================
    

    Am I doing something wrong perhaps?

  • @EarthMind
    An Nvidia 420m is a lowest performance level pos entry card. Interresting, that it actually seems to be even slower/less powerful than your intel chip.
    J.

  • This makes no sense to me, but I’m no expert at all.

    Maybe @just can help us here (I hope he doesn’t mind being called here ;)).

    Cheers!

  • The phenomenon is known since 2009, when Bumblebee was born. I’m just scared to give a full, correct explanation for it. It could provocate new, endless questions about how Bumblebee and Nvidia work. I’m trying to transform a complete explanation in a brief example(s).

    It’s not an answer yet - it will follow later - but let’s start. It’s a kind of introduction only.

    1. Measuring Bmb performance with glmark2 is like benchmarknig it with glxgears, Lotus 1-2-3 or Pac-Man. glxspheres64 is the only benchmarking tool, approved by Bumblebee Project.

    2. When I repeat the OP tests exactly as they are shown above, I get similar scores on Nvidia GT 555M card - 974 points on Intel, 323 points on Nvidia. Seems like Nvidia performs 3 times worse than Intel.

    3. But when I run glmark2 test on Nvidia the right way, Nvidia’s score is quite different: 3132 points. Which is correct - mid-range Nvidia perform about 3 times better than Intel.

    4. On the same computer, glxspheres64 test returns 60 frames/sec running on Intel, and 202 frames/sec running on Nvidia. Again, Nvidia is roughly 3 times faster than Intel. Both tests are run with default scene’s window size; resizing the window highly influences the final framerates.

    I’ll try to find a brief explanation during the lunch break, which follows.

  • Bumblebee benchmarks: Nvidia is faster than Intel

    A simplistic explanation of a complex thing.

    Warning: long text ahead

    Part 1. The theory.

    How does Optimus work? It works absolutely identical way on both Linux and WindOS. When a program is launched with optirun prefix:

    • the scene (images) a program generates is directed to Nvidia GPU
    • it doesn’t depend on images complexity - everything is sent to Nvidia, whether we run nano or Serious Sam3
    • Nvidia renders the scene
    • rendered scene is copied back from Nvidia to Intel GPU
    • Intel sends the scene to the display

    Like everything in the Universe, Bumblebee is not perfect. It shines in saving the power and cooling a laptop. Its bottleneck is the speed, at which rendered scene is copied back from Nvidia to Intel. Copying is not implemented in the most efficient way.

    Nvidia shines like a crazy diamond independently on scene’s complexity, and is a few times faster than Intel. It is not guilty, that Bumblebee cannot transfer back to Intel the images it generates at an adeguate rate.

    glmark2 scene is rather simple. With glmark2 we’re facing a case when it’s overall faster to render the scene on weaker Intel, than render it on faster Nvidia and then copy it back to Intel - remember about Bmb’s bottleneck, Bmb can’t do the copying efficiently.

    glxspheres64 scene is many times more complex than glmark2 one. The determinative factor here is the actual GPU’s ability to render the scene faster, not the speed at which it is copied back.

    The rule of thumb is - Intel is faster than Bumblebee for simple graphics; Bumblebee is faster than Intel for heavy and extra-heavy graphics. More heavy the graphic is, more Bmb is efficient. Do not confuse Bumblebee with Nvidia here - Nvidia is always faster than Intel, for any kind of graphics.

    From theory to practice now.

  • Bumblebee benchmarks: Nvidia is faster than Intel

    A simplistic explanation of a complex thing.

    Warning: long text ahead

    Part 2. The practice. Glmark2.

    All tests are done on Nvidia GT 555M Optimus video card.

    glmark2 on Intel:

    ┌──[just]@[alexrep]:~$
    └─> glmark2
    ===...
    	glmark2 2017.07
    ===...
    	OpenGL Information
    	GL_VENDOR:     Intel Open Source Technology Center
    	GL_RENDERER:   Mesa DRI Intel(R) Sandybridge Mobile
    	GL_VERSION:    3.0 Mesa 17.1.6
    ===...
    ...
    ===...
    	glmark2 Score: 974
    ===...
    

    glmark2 on Nvidia, copying scene back to Intel, to be able to see it:

    ┌──[just]@[alexrep]:~$
    └─> optirun glmark2
    ===...
    	glmark2 2017.07
    ===...
    	OpenGL Information
    	GL_VENDOR:     NVIDIA Corporation
    	GL_RENDERER:   GeForce GT 555M/PCIe/SSE2
    	GL_VERSION:    4.5.0 NVIDIA 384.59
    ===...
    ...
    ===...
    	glmark2 Score: 323
    ===...
    

    What? 323 on Nvidia against 974 on Intel? Nvidia is 3 times slower than Intel? No! It’s Bmb is 3 times slower than Intel - because it copies rendered images too slowly. Nvidia is fast.

    Can we exclude the step of copying rendered images back to Intel and obtain a true result, the speed at which the scene is actually rendered on Nvidia? Yes, we can. Of course, we won’t be able to observe the rendering on screen - Intel won’t receive the images, - but we’ll get the true Nvidia rendering score:

    ┌──[just]@[alexrep]:~$
    └─> optirun -b none env DISPLAY=:8 glmark2
    ===...
    	glmark2 2017.07
    ===...
    	OpenGL Information
    	GL_VENDOR:     NVIDIA Corporation
    	GL_RENDERER:   GeForce GT 555M/PCIe/SSE2
    	GL_VERSION:    4.5.0 NVIDIA 384.59
    ===...
    ...
    ===...
    	glmark2 Score: 3132
    ===...
    

    Nvidia gets 3132 points, against only 974 for Intel. Did I already tell you that Nvidia is 3 times faster than Intel :) ?

    From glmark2 to glxspheres64 now.

  • Bumblebee benchmarks: Nvidia is faster than Intel

    A simplistic explanation of a complex thing.

    Warning: long text ahead

    Part 3. The practice. Glxspheres64.

    glxspheres64 offers a reasonably complex graphic scene. It’s heavy enough to allow to take into account Nvidia’s rendering speed only, and to neglect Bmb’s copying speed.

    glxspheres64 on Intel:

    ┌──[just]@[alexrep]:~$
    └─> glxspheres64
    Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
    Visual ID of window: 0xa8
    Context is Direct
    OpenGL Renderer: Mesa DRI Intel(R) Sandybridge Mobile
    61.319614 frames/sec - 68.432690 Mpixels/sec
    59.787537 frames/sec - 66.722892 Mpixels/sec
    59.765153 frames/sec - 66.697911 Mpixels/sec
    59.751729 frames/sec - 66.682930 Mpixels/sec
    59.885955 frames/sec - 66.832726 Mpixels/sec
    ^C
    

    On Intel we get standard 60 frames/sec.

    glxspheres64 on Nvida, images are copied back to Intel by Bmb:

    ┌──[just]@[alexrep]:~$
    └─> optirun glxspheres64
    Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
    Visual ID of window: 0x21
    Context is Direct
    OpenGL Renderer: GeForce GT 555M/PCIe/SSE2
    171.474796 frames/sec - 191.365872 Mpixels/sec
    173.994172 frames/sec - 194.177496 Mpixels/sec
    174.020400 frames/sec - 194.206766 Mpixels/sec
    169.354501 frames/sec - 188.999623 Mpixels/sec
    172.351104 frames/sec - 192.343832 Mpixels/sec
    174.101078 frames/sec - 194.296804 Mpixels/sec
    ^C[ 3879.843723] [WARN]Received Interrupt signal.
    

    170 frames/sec on Nvidia is roughly 3 times faster than 60 frames/sec on Intel. Not bad. BTW, why all that race for stratospheric frame rates? The human eye doesn’t distinguish any difference in frame rates above 60. That is why it was choosen as the standard for TV and displays. It’s for another topic.

    Does copying rendered images back from Nvidia to Intel reduce results for glxspheres64, as it does for glmark2? Sure, it does. It reduces glxspheres64 framerates, exactly as it reduces glmark2 scores.

    Can we exclude copying back rendered images for glxspheres64, as we did for glmark2? To get the true framerates on Nvidia? Sure, we can. Again, we won’t be able to observe rendered images on screen, as they won’t be copied back to Intel, but we’ll get the real frame rates on pure Nvidia:

    ┌──[just]@[alexrep]:~$
    └─> optirun -b none env DISPLAY=:8 glxspheres64
    Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
    Visual ID of window: 0x27
    Context is Direct
    OpenGL Renderer: GeForce GT 555M/PCIe/SSE2
    2034.673040 frames/sec - 2270.695113 Mpixels/sec
    2033.021833 frames/sec - 2268.852366 Mpixels/sec
    2031.565042 frames/sec - 2267.226587 Mpixels/sec
    2030.404151 frames/sec - 2265.931032 Mpixels/sec
    2031.252044 frames/sec - 2266.877282 Mpixels/sec
    2028.682951 frames/sec - 2264.010173 Mpixels/sec
    ^C[  164.133057] [WARN]Received Interrupt signal.
    

    That’s the truth. Intel renders the images at 60 frames/sec. Nvidia renders the images at 2030 frames/sec. Not 3 but 30 (thirty) times faster than Intel. It’s a pity that Bumblebee makes this excellent result 12 times slower.

    Thanks for reading. I won’t post a single word in next 3 months.

  • Thanks a million!

    I’ve really learned with your posts… and I knew you would solve the mystery.

    Thanks again ;)

  • @just That’s a very interesting read! Thanks for sharing your knowledge :-)

    So, now we start the 60 FPS limit debate! It’s a bogus!!!

  • @karasu said in Bumblebee Nvidia GPU is slower than Intel IGPU?:

    I’ve really learned with your posts…

    The highest possible award. Very thankful.

Posts 10Views 351
Log in to reply