I ran quite a few renders in Blender v2.93. I quickly learned what were the best performance options for my X58 & RTX 3080 combo. I split some of the benchmarks into “GPU Only” & “GPU + CPU” results. Blender supports GPU Rendering for both AMD and Nvidia. Nvidia supports OptiX and of course CUDA. Nvidia’s OptiX is capable only with Nvidia’s RTX GPUs since OptiX is a Ray Tracing API. This allows Nvidia RTX Tensor Cores to be utilized during GPU rendering. CUDA is still available and useful, but will see how much better Nvidia’s RTX OptiX API performs compared to CUDA. CUDA is also required in order to use OptiX since CUDA will provide the data and OptiX will compute the data using the Tensor Cores. Besides GPU rendering there is also CPU rendering (homogeneous) which I am obviously going to avoid with such an old platform and CPU architecture. Using only the CPU for rendering isn’t recommended. However, we have Heterogeneous Rendering which allows both the CPU & GPU to be utilized. Blender can split the workload between the two. Newer CPU architectures and platforms will benefit this type of rendering much better than the X58 based CPUs, but I will show various settings and Heterogeneous Rendering.
Blender - Xbox Controller
While I was deciding which Blender images\renders I would use I came across this cool Xbox Controller that was created, animated and blended within Blender. It was very cool and appeared decently demanding. The X58 + RTX 3080 had no issues while using Blenders Viewport which topped out at a constant 144FPS to match my monitors refresh rate (1080p is what I was currently using for the highest refresh rates during my benchmarks). Early in my benchmark tests I would use a mixture of tiles settings: 64x64, 128x128 and 256x256. I quickly learned that 256x256 was my best settings for rendering. Tiles determines what dimensions will be rendered with any particular image. Blender internally breaks down the images into pieces and each “piece” is rendered, just think of puzzle pieces being placed. Smaller Tile sizes are better for CPUs (32x32 or 64x64) and larger Tiles are better for GPUs (256x256). So when using Heterogeneous Rendering (CPU & GPU) it’s best to find a nice balance between the two. Since the RTX 3080 is a beast at rendering I learned quickly that it has no issues with 256x256.
Xbox Controller – RTX 3080 GPU Rendering
Looking at the results we can quickly see that OptiX is faster than CUDA. We also see that using CUDA with128x128 is slightly faster than 256x256 by 3.66%. That would be CUDA 128x showing roughly 58 seconds and CUDA 256x coming it at 1 minute. Very minor, but CUDA 128x wins here. The 64x64 tiles show the worse results coming in at 1 minute and 3 seconds (& 85 milliseconds). Nvidia’s OptiX on the other hand shows improvements over CUDA once the RTX Tensor Cores are utilized. In this case 256x256 was the best option and 64x64 was the worse. OptiX was roughly 5 seconds to 7 seconds faster than CUDA in this specific Blender Xbox Controller render. The results seem minor, but it adds up when you have to render several images for an animation over time. There are many ways to speed up rendering, but during my benchmark testing I used the default settings and only changed the Tile settings to give a fair and balance view of the results. Also the fact that I don’t use Blender nearly as much nowadays also kept me away from modifying a lot of the settings. While using Nvidia’s OptiX 128x128 & 256x256 they both rendered in roughly the same time coming in at roughly 53 seconds.
Xbox Controller – RTX 3080 GPU & CPU Rendering
Now we will take a look at the Heterogeneous Rendering which allows both the CPU & GPU to be utilized. My X5660 was clocked at 4.6Ghz and the RAM was clocked at 1600Mhz (DDR3 RDIMM). Not the most impressive specs, but I can’t complain. I ran the benchmarks using various options: CUDA + CPU and OptiX + CPU while using different tile settings (64x64, 128x128 & 256x256). It was clear that OptiX was the better choice just by looking at the 64x64 results. “CUDA + CPU” took nearly 1 minute and 12 seconds while OptiX took only 1 minute and 5 seconds. Sadly I didn’t run more” OptiX + CPU” benchmarks using larger tiles in this specific benchmark. Comparing the “GPU + CPU” results to the “GPU Only” results shows that there are situations where adding the CPU can be beneficial such as the “CUDA 128x128” vs the “CUDA + CPU 256x256”; the “CUDA + CPU” was roughly 7 seconds faster than simply using the GPU (CUDA) by itself. Using the only the GPU (OptiX) shows that it performed the best out of all of the test, coming in at 53 seconds.