Choosing the right professional GPU isn’t easy. For our most comprehensive group test of graphics cards yet, Jason Lewis puts 14 of Nvidia and AMD’s current contenders through a battery of real-world tests
Once again, it’s time for our annual shoot-out of professional GPUs. Back in 2010, my previous round-up featured AMD’s still-current line-up of ATI FirePro cards, as well as a few cards from Nvidia’s last-generation FX products. This time around, I will be looking at Nvidia’s current crop of Quadro cards, based on its Fermi architecture, as well as two new contenders from AMD.
Both product line-ups are aimed at the high-end CAD and DCC markets. They are targeted at graphics professionals who rely not only on speed, but rock-solid stability and support.
Professional vs consumer cards
So what is the difference between a consumer-grade graphics accelerator and its professional counterpart? From a hardware standpoint, the answer is “not much” – certainly not as much as in the late 1990s and early 2000s when companies such as 3Dlabs, Intergraph and ELSA were building hardware specifically aimed at professional users.
These days, most pro cards share hardware with their consumer counterparts, although the chips are usually hand-picked from the highest-quality parts of a production run. Also, they carry a lot more RAM than their consumer counterparts – which is actually very important, as I will discuss later on in this article.
However, the biggest differences between professional and consumer cards are their driver set and software support. While consumer hardware is tuned more towards fill rate and shader calculations, pro cards are tuned for 3D operations such as geometry transformations and vertex matrices, as well as better performance under GPGPU APIs such as CUDA, OpenCL, and DirectCompute.
Pro cards are also extensively optimized, tested and certified for use with CAD and DCC applications – which, in the case of the cards on test, include 3ds Max, Maya, Softimage, AutoCAD and SolidWorks. This not only increases performance, but offers excellent stability and predictability when compared to their desktop counterparts, particularly when running CAD packages. When I polled other users, the general consensus was that while these applications will work on consumer graphics accelerators, performance with non-professional cards is sub-par, and viewport glitches and anomalies are quite common. These issues are much less frequent with pro cards, and when they are identified, are usually addressed rather quickly, since manufacturers offer much more extensive customer support for their professional products.
This year’s line-up
This review will include benchmarking for the cards in the 2010 review (AMD’s ATI FirePro V3800, V4800, V5800, V7800, V8800; and the previous-generation Nvidia Quadro FX 3800 and FX 5800). In addition, I have included Nvidia’s current lineup of mid-to-high-end cards, the Quadro 2000, 4000, 5000, and 6000, as well as one new mid-range and one new high-end card from AMD: the FirePro V5900 and V7900.
All of the present-generation cards in this test support all the most current workstation graphics and compute APIs, including Shader Model 5, OpenGL 4, DirectX 11, OpenCL 1. In addition, the Nvidia cards support its proprietary CUDA API. The previous-generation cards – the FX 3800, FX 5800 and V8750 – support Shader Model 4, OpenGL 3 and DirectX 10.
So let’s take a look at the individual cards that we will be looking at here. The specs for the cards from the 2010 review can be seen there, so here, I will only be detailing the new cards.
There are only two entry-level cards on test, the ATI FirePro V3800 and V4800: both of them legacies of the 2010 review. Nvidia does have new entry-level products: the Quadro 400 and Quadro 600. However, as these are marketed as entry-level CAD solutions rather than for DCC or animation, I have not included them here.
Nvidia The first new card in this review is the Quadro 2000. It comes with 1GB of GDDR5 memory on a 128-bit memory interface offering a memory bandwidth of 41.6GB/s. The GPU is clocked at 625MHz and has 192 CUDA cores for GPU computing. Outputs include 1 dual-link DVI connector and 2 DisplayPort connectors. It is rated at 62W for power consumption.
AMD Our first new card from AMD is its new FirePro V5900. It has double the memory of its predecessor, the V5800, going from 1GB of GDDR 5 memory to 2GB on a 256-bit interface for a memory bandwidth of 64GB/s. The GPU is clocked at 600MHz and has 512 Stream processors for GPU computing. Outputs include 1 dual-link DVI connector and 2 DisplayPort connectors. It is rated at 75W for power consumption.
Other legacy mid-range cards included in this review are the ATI FirePro V5800 and the Nvidia Quadro FX 3800.
Nvidia We have two high-end cards from Nvidia: the Quadro 4000 and Quadro 5000. The Quadro 4000 comes with 2GB of GDDR5 memory on a 256-bit memory interface offering a memory bandwidth of 89.6GB/s. The GPU is clocked at 475MHz and has 256 CUDA cores for GPU computing. Outputs include 1 dual-link DVI connector and 2 DisplayPort connectors. It is rated at 142W for power consumption.
The Quadro 5000 comes with 2.5GB of GDDR5 memory on a 320-bit memory interface offering a memory bandwidth of 120GB/s. The GPU is clocked at 513MHz and has 352 CUDA cores for GPU computing. Outputs include 1 dual-link DVI connector and 2 DisplayPort connectors. It is rated at 152W for power consumption.
AMD Our second new AMD card is the FirePro V7900. It sports 2GB of GDDR5 memory on a 256-bit interface for a memory bandwidth of 160GB/s. The GPU is clocked at 725MHz and has 1,280 Stream processors for GPU computing. Outputs include 4 DisplayPort connectors. It is rated at 150W for power consumption.
Legacy cards included in this category are the ATI FirePro V7800 and V8800, and the Nvidia Quadro FX 5800.
Nvidia At the ultra-high end of Nvidia’s range, we have the Quadro 6000. It comes with 6GB of GDDR5 memory on a 384-bit memory interface offering a memory bandwidth of 144GB/s. The GPU is clocked at 574MHz and has 448 CUDA cores for GPU computing. Outputs include 1 dual-link DVI connector and 2 DisplayPort connectors. It is rated at 204W for power consumption.
AMD AMD does offer a product in this category: the FirePro V9800. However, I don’t have one available to benchmark, so it isn’t included here.
About the technology
Before we go further, let’s take a look at what some of those specifications really mean. DisplayPort is quickly becoming the output format of choice for professional graphics cards – for a number of good reasons, including physically smaller connectors, better signal integrity and the ability to go beyond the current resolutions for future super-high resolution displays. However, not all of us have monitors with DisplayPort connectors.
This is where DisplayPort to DVI adapters are required. These work just fine, but you may need more of them than are provided with the cards as standard – and while DisplayPort to single-link adapters are relatively inexpensive, the real problem comes for those of you with 30? monitors that only have dual-link DVI inputs.
In order for a DisplayPort output to get a maximum-resolution 2,560 x 1,600 signal to a DVI-equipped monitor, an ‘active’ DisplayPort to dual-link DVI adapter must be used. These are powered units that draw power from the host computer via a USB connection, so not only are they expensive compared to the single-link adapters (around $100 for the active dual-link DVI converter, as opposed to $25 and $10 for active and passive single-link DVI ), but they also require a free USB port for each adapter used.
Despite these inconveniences, the industry’s embrace of the DisplayPort interface is permitting technological advances such as 30-bit colour, and AMD’s Eyefinity technology, which enables a single graphics card to drive three to six monitors.
I discussed Eyefinity briefly in the 2010 review, but I want to go over it again here as I feel it is an important feature of the FirePro products, and one that Nvidia should consider emulating. The V4800, V5800, V5900 and V7800 can drive up to three 30″ monitors at 2,560 x 1,600 resolution via one dual-link DVI connector and two DisplayPort connectors. The V7900 and V8800 can drive four 30″ monitors at 2,560 x 1,600 with four DisplayPort connectors; and the V9800 can drive up to six 30″ monitors at 2,560 x 1,600 through six Mini DisplayPort connectors for a desktop resolution of 7,680 x 3,200 if arranged in the traditional 3 x 2 setup.
While you can have multi-monitor set-ups with Nvidia cards, you need to have multiple cards installed in the workstation to do so as the Quadro cards only support two displays per card. This multi-card solution adds to the system’s power consumption, and greatly increases the system cost.
So do DCC users really need this many monitors? I have been using three and four-monitor configurations for a while now, and I can tell you that once you try it, you won’t want to go back to just two. Let’s say, for example, that you have a pair of 30? displays, a 22? display and a Wacom Cintiq. You plug them all into your FirePro card and… hey, check it out: you’ve got Max or Maya open one of the 30-inchers, Photoshop on the other, ZBrush or Mudbox running on the Cintiq, and your reference art or a web browser open on the 22-inch display! No more [Alt]-tabbing, and no more stacking windows so that only one or two are visible at the same time!
(Again, you could have this setup with Nvidia hardware, but you would just have to pay for an additional card, have a free PCI Express x16 slot, and extra graphics card power cables to your power supply if you opt for the higher end cards.)
If money is no object, you could even install four V9800s in a single system and drive 24 30? displays – and yes, Windows 7 will support 24 displays! Having the ability to run multiple displays and be able to see everything simultaneously is a very enjoyable and productive way to work, and it is the perfect complement to today’s multi-core workstations.
Before I get into the benchmark results, I also want to talk a little about GPGPU computing. This is the process by which the graphics card’s GPU is used to augment the system’s CPUs to perform general computing tasks. The potential of this technology is exciting, and we are just starting to see applications that make use of it.
This is where the professional graphics cards set themselves apart from their consumer counterparts. Remember that I said earlier that pro cards have more on-board RAM? Well, the more RAM on the card, the more intensive the computations that can be performed. Unless the tasks the GPU is trying to perform fit entirely within the on-board memory, data must be swapped between the RAM on the GPU and that of the workstation itself, making computation much slower.
For example, in the current crop of GPU-accelerated raytracers, the entire 3D scene must fit within the memory of the graphics card in order for the card to be used to help with the rendering process. If the scene is too big, the GPU will just ignore the render, and only the system’s CPUs will be used, resulting in much longer render times. This is where the 2-6GB memory banks of the pro cards come in extremely handy.
There are two major APIs for GPU computing: CUDA and OpenCL. While OpenCL is an open standard, and supported by both Nvidia and AMD, Nvidia has something of a head start with CUDA, its proprietary technology.
Nvidia was one of the first hardware manufacturers to really push for GPU computing and has fully embraced the technology since 2006, when it introduced the G80 – its first unified shader architecture – on the 8800 GTX. This early lead has resulted in a wider range of software applications that support CUDA as opposed to OpenCL. However, I would expect this to change as OpenCL matures, as developers would much rather develop for one industry-standard API than multiple proprietary APIs.
When it comes to rendering, there are several applications that can use the GPU to speed things up, pretty evenly divided between CUDA and OpenCL. To name but a few, mental images’ iray, Random Control’s Arion and Refractive Software’s Octane Render are all CUDA-only renderers; while Glare Technologies’ Indigo Renderer, Chaos Group’s V-Ray RT and Art And Animation Studio’s FurryBall renderer all use OpenCL. GPU acceleration is also coming to the open-source LuxRender and cebas’s finalRender 4 in the near future, and we believe both will be OpenCL applications as well.
Currently available to download for free is StudioGPU’s MachStudio Pro, which ditches both CUDA and OpenCL in favor of Microsoft’s DirectX API. It is quite an impressive piece of software, and the fact that it is free makes it truly outstanding. Editor’s note: since this review was written, StudioGPU has closed. However, you can still download MachStudio Pro on CNET.
Despite this range of options, the CG industry is currently divided about the usability of GPU acceleration. Some studios have embraced the technology; others feel that GPU renderers cannot live up to their CPU-based counterparts in terms of output quality. A common compromise is to use GPU-based rendering solely for pre-viz or special cases, reserving the final passes for the CPU. A case in point is Sony Pictures Imageworks, which uses its in-house GPU-based renderer, Splat, for effects; and the CPU-based Arnold for beauty passes.
Let’s get to the meat of the testing. For my test system, I am still using the trusty Z800 workstation, generously provided by HP. It sports a pair of six-core 32nm Xeon X5680 CPUs running at 3.33GHz. Running Windows 7 64-bit, and packed with 18GB of DDR3 memory and a 15,000 RPM Seagate SAS drive, its horsepower really enabled me to push the cards during testing.
Monitor testing was performed with a pair of 30″ displays, each at their native resolution of 2,560 x 1,600, for a total desktop resolution of 5,120 x 1,600.
For software, I was using Autodesk 3ds Max 2012, Maya 2012, Softimage 2012 and Mudbox 2012; NewTek’s LightWave 10; a multi-application benchmark that also included Luxology’s modo 501 and Maxon’s Cinema 4D; Maxon’s Cinebench synthetic benchmark; The Foundry’s Mari; mental images’ iray; Art And Animation Studio’s FurryBall renderer; and StudioGPU’s Mach Studio Pro.
Unless otherwise stated, the benchmark scores below represent the frame rates achieved when running through a range of standard navigation and manipulation operations on the scenes shown (geometry counts and texture details are shown on the images), and are an average of the lowest and highest figures observed during testing. Sadly, all of these test scenes used proprietary assets and are not available for download.
Maya is one of Autodesk’s three major 3D modeling, animation and rendering packages. Its display technology is built upon the OpenGL API and is heavily optimized for it.
Overall, the FirePro cards take a slight lead over the Quadro cards here, with the surprise being the two new FirePro cards – the V5900 and V7900 – which outpace both the other FirePros and most of the Nvidia cards, bar the Quadro 6000. (I was told during my product briefing on the V5900 and V7900 that their drivers have a feature called ‘geometry boost': a set of special optimizations for Maya that enable the cards to push polygons around faster. It seems to work.)
Averaging the scores from the three tests, the cards place in the following order, from fastest to slowest:
Quadro FX 5800
Quadro FX 3800
3ds Max 2012
3ds Max is another of Autodesk’s big three modeling, animation and rendering applications. It is unique in the fact that it is the only application in this test that gives the user a choice of three display modes: the Nitrous viewport (a custom Autodesk-tuned version of DirectX), DirectX 9 and 10, and OpenGL. (If Nvidia and AMD port their custom 3ds Max performance drivers over to Max 2012, there will even be a fourth option.) For this review, all testing was done with the new Nitrous viewport.
Here, the Nvidia Quadro cards command a significant lead over the FirePro cards in terms of performance, with the exception of the previous-generation Quadro FX cards. Within the Quadro Fermi Family, performance lines up pretty much in order of price, with the 6000 taking first place, the 5000 second, 4000 third, and the 2000 fourth; followed by the FirePro V7900 and V5900; then the rest of the FirePro line-up in descending order of price, down to the V4800; the Quadro FX series cards; and lastly, the entry-level FirePro V3800.
Autodesk’s last 3D application is Softimage. Like Maya, it is built upon OpenGL. However, compared to other 3D applications, its viewports run very fast without any kind of degradation.
With Softimage, we have another win for the FirePro cards, with the V7900 taking the lead, and the rest of the cards in this order:
Quadro FX 5800
Quadro FX 3800
Like Maya and Softimage, Autodesk’s sculpting package uses OpenGL as its display technology of choice. However, unlike traditional DCC apps, Mudbox works with very high poly counts (an ‘average’ scene is 8 million polygons or higher, whereas 2-5 million would count as complex for a typical app). As a result, it likes a lot of RAM on the video card, so the professional cards really set themselves apart here.
Some of you may ask why I have not included any ZBrush benchmarks here. The answer is simple: Mudbox’s viewport performance is dependent on the graphics card. ZBrush, on the other hand, uses Pixologic’s proprietary CPU-based technology to render the viewports, so the installed graphics card really has very little impact on performance.
For the Mudbox tests, each card was evaluated in two different ways. The first is a measure of overall viewport performance while panning, rotating or zooming the model, while the second measures the software’s response while sculpting.
The benchmark results were taken when the entire model was displayed in the viewport, as Mudbox does some clever geometry culling when you zoom in to the model, disregarding the off-screen polygons to improve viewport frame rate.
Again we have a change of dominance, as the Nvidia Quadro 6000 to 4000 cards take the lead with the FirePro V7900 and V5900 following; then the FirePro V8800, V7800 and Quadro FX 5800 packed closely together; the Quadro 2000 and FX 3800; and the FirePro V5800 to V3800 at the bottom of the list.
(Presumably, the multi-layered 33.6 million polygon model was too much for the 512MB and 1GB cards to handle, as stepping down a subdivision level to roughly 8.5 million polygons – or two levels to roughly 2 million polygons in the case of the FirePro V3800 – greatly increases performance.)
LightWave is one of the oldest 3D applications still on the market today, and is used widely in broadcast visual effects. Once again, OpenGL is its API of choice.
And again, we ping-pong back to the FirePro cards as the overall winners here. The order is as follows:
FirePro V8800, V7800 (tied)
Quadro 5000, FirePro V5800 (tied)
Quadro 4000, FX 5800, FirePro V4800 (tied)
Quadro FX 3800
Mari is a new 3D texture-painting program similar to Maxon’s BodyPaint 3D, originally developed in house at Weta and now marketed by The Foundry. It is designed to do 3D texture painting on models with high poly counts, and work with extremely high-resolution textures. (It can work with maps of up to 32,768 x 32,768 on models with several million polygons.)
This is where the Quadro 6000 and 5000 come in handy: while the minimum recommended spec is 1GB of on-board RAM, you will quickly fill up a 1GB or 1.5GB card’s memory once you start using the extremely high texture resolutions that Mari offers.
Right now, Mari only runs on Nvidia hardware, although I have been told that The Foundry are working on this.
Here the Quadro 6000 takes the lead, followed closely by the Quadro 5000, then the FX 5800, the 4000, and lastly, the FX 3800. To my surprise, the Quadro 2000 is not officially supported, and even though Mari will still let you run the application with the Quadro 2000 installed, it immediately crashed when I tried to open a scene file.
For a reference, I have also conducted viewport benchmarks for a single model across multiple applications and multiple display modes. The same OBJ file was imported into a fresh scene within each 3D application to determine performance differences between graphics cards, and also between various applications.
The model is 7.6 million polygons, and is displayed in the following modes:
3ds Max: Nitrous viewport, realistic shading mode
MAYA: Viewport 2.0, shaded and textured with ambient occlusion enabled
Softimage: shaded and textured mode
modo: Advanced OpenGL, shaded mode
Cinema 4D: Enhanced OpenGL, shaded mode
LightWave: shaded and textured mode.
Please note: this benchmark is not intended as a performance comparison of the software applications themselves – only of the hardware on test. This article discusses the benchmark in more detail.
Anyone who has read any of my previous reviews will know that I am not a big fan of synthetic benchmarks. This is not the fault of the engineers who write them: it’s just that there are too many variables for them to be able to predict accurately how any piece of hardware will perform in all types of production environments.
However, I have had some requests to include Cinebench scores in my reviews – so congratulations, Cinebench: you have the honor of being the only synthetic benchmark included in these tests.
Interestingly, the Quadro 5000 takes the crown with Cinebench. It is curious that it beats out the 6000 despite having both less memory and a slower core clock speed. The placing is as follows:
Quadro FX 5800
Quadro FX 3800
Next let’s take a look at a handful of GPU computing benchmarks. (There will be more of these to come in future video card reviews, as more GPU-accelerated renderers become available.)
Iray is a progressive unbiased renderer developed by mental images, and included in 3ds Max 2012. It supports CUDA only, and therefore only uses the GPU on Nvidia cards: on AMD cards, it defaults to CPU mode.
You can see here the massive increase in performance the Quadro cards offer over CPU-only rendering. Adding a Quadro 6000 cuts the render time down to a third of that for the CPU alone; the 4000 does it in about half the time; and the 5000 falls roughly in the middle. Given that the test system sports 12 3.33GHz CPU cores, this is quite an impressive result.
Art And Animation Studio’s FurryBall renderer is an OpenCL-based renderer that calculates near-final-quality images in the Maya viewport. Although I have only just started to look at this piece of software, my initial impressions are quite positive and I will be looking to go into more depth with it for future reviews. The benchmarking was done with the developer’s free sample scene.
As with other GPU-accelerated renderers, the amount of memory available to the GPU plays a major role in performance as illustrated by the commanding lead the 6GB Quadro 6000 takes followed over the 4GB Quadro FX 5800. Third, we have the FirePro V8800; fourth is the FirePro V7800; and the Quadro 5000 and FirePro V7900 are tied for fifth. In sixth place we have the Quadro 4000; and in seventh is the Quadro 2000. In eighth, ninth and tenth places, we have the FirePro V5900, V5800 and Quadro FX 3800; with the FirePro V4800 in eleventh. (The software crashed when running on the FirePro V3800, so it gets no score here.)
MachStudio Pro is a unique piece of software from a relative newcomer to the CG industry, StudioGPU. It is a standalone scene-assembly application that was one of the first on the market to leverage the GPU to perform beauty-pass rendering.
However, unlike the few GPU-accelerated raytracers out there, MachStudio Pro does not leverage OpenCL, DirectCompute or CUDA, so it is not doing GPGPU-compute tasks in the traditional sense. Instead it uses technology based on advanced DirectX-based pixel shaders to achieve its final render output.
Here we have the Quadro 6000, FirePro V8800, V7900 and V7800 all tied for first place. Next are the Quadro 5000, 4000 and FX 5800, all tied for second. In third place we have another tie between the Quadro 2000, FirePro V5900 and V5800. Lastly, there is – you guessed it – another tie between the Quadro FX 3800, FirePro V4800 and V3800.
The overall verdict
The relative performance of the cards varies greatly from test to test, making it impossible to declare any of them a clear winner – or even to declare a winner between Nvidia and AMD.
However, the closest thing we have to a winner would be the Quadro 6000. It is a monster of a card, equipped with more memory than most computers shipped with just a couple of years ago. As a result, it was especially dominant in the GPU rendering tests, and it will be interesting to see how it flexes its muscles when more such benchmarking is done in the future. However, with this massive performance comes a massive price tag, and many will find its $4,000 price point a prohibitive factor.
Overall, Nvidia’s current Quadro line-up performs very well. Performance is significantly higher that its previous Quadro FX products, and the Nvidia cards are the clear winners when you look at 3ds Max, Mudbox, modo and Cinema 4D. While the Quadro 6000 is the fastest card in Nvidia’s lineup, I would personally recommend the Quadro 5000 if you are focused more on GPU computing tasks; or the Quadro 4000 for more cost-effective application performance.
However, while I am impressed with the Quadro line-up’s performance, I would like to see them support more than two monitors in future.
As for AMD’s ATI FirePro line-up, things look good here as well. They take a slight lead with Maya, Softimage and LightWave; and while the Quadro 6000 may take the overall performance crown, I would personally recommend the FirePro V7900 as the overall best value card in the professional graphics sector if you are not going to leverage GPU computing heavily. It competes with the Quadro 4000 in terms of price, but in many cases beats out the Quadro 4000 and 5000, and in a few tests, it is nipping at the heels of the massive Quadro 6000. It also beats out its near twin, the V7800, by a pretty significant margin, and beats out its big brother, the V8800, as well.
Overall, things are looking quite good in the professional graphics sector, with both Nvidia’s Quadros and AMD’s FirePros making a strong showing here. Neither is clearly dominant, so your buying decision will boil down to two factors: which applications you will be using, and cost.
The Nvidia cards take the performance crown in more applications than the FirePros do, but are more expensive than AMD’s cards. The FirePros are cheaper, but outperform the Quadros in fewer applications and only support OpenCL, limiting the range of GPU-accelerated render engines with which they can be used.
One final note: if Nvidia and AMD follow their previous release patterns, we will probably be seeing new pro cards later this year. If so, look for an updated review soon after.
I’d like to thank several vendors and individuals for their contributions to this article.
Update: Thanks for all your feedback on this review. Rather than answer your questions in brief, we thought we’d give them the space they deserve. Click this link to read Jason’s new FAQ about our hardware reviews policy.