Monday, December 6th, 2010 Article by Jason Lewis

Review: AMD’s ATI FirePro professional GPUs

The video card or Graphics Processing Unit (GPU) is one of the most complex pieces of hardware in modern PCs. It has a central logic processor, its own memory banks and various I/O controllers, making it more like a miniature computer within a computer. Some would even argue that it is a more powerful platform for 3D calculations than the CPU itself – but that’s a debate for another time. Today, we are here to look at AMD’s current lineup of professional 3D graphics accelerators, the ATI Fire Pro series.

The world of 3D GPUs has, since its inception, consisted of two main categories: the consumer cards (also known as the desktop sector, or gaming cards) and the professional products. While there used to be several players in the latter category, including 3Dlabs, Matrox and ELSA, the industry is now dominated by two companies: AMD (following its acquisition of ATI) and Nvidia.

The ATI FirePro series is part of AMD’s professional line, aimed at the high-end CAD and DCC markets. It is targeted at graphics professionals who rely not only on speed, but rock-solid stability and support.

Professional vs consumer cards

I’m often asked what the difference between desktop graphics accelerators and their professional counterparts is. From a hardware standpoint, the answer is “not much” – certainly not as much as in the late 1990s and early 2000s when 3Dlabs and ELSA were building hardware specifically aimed at professional users.

These days, most pro cards share hardware with their consumer counterparts, although the chips are usually hand-picked from the highest-quality parts of a production run. Also, they carry a lot more RAM than their consumer counterparts – which is actually very important, as I will discuss later on in this article.

However, the biggest difference between professional and consumer cards is their driver set and software support. While consumer hardware is tuned more towards fill rate and shader calculations, pro cards are tuned for 3D operations such as geometry transformations and vertex matrices, as well as better performance under GPGPU APIs such as OpenCL and DirectCompute. Pro cards are also extensively optimized, tested and certified for use with CAD and DCC applications. In addition, the manufacturers offer much more extensive customer support for their professional products than the equivalent consumer cards.

The driver set for the ATI FirePro cards includes extensive optimizations for popular DCC and CAD applications, including 3ds Max, Maya, Softimage, AutoCAD and SolidWorks. These not only increase performance, but also offer excellent stability and predictability when compared to their desktop counterparts, particularly when running CAD packages. When I polled other users, the general consensus was that while these applications will work on consumer graphics accelerators, performance with those non-professional cards is sub-par, and viewport glitches and anomalies are quite common: issues that simply do not exist with pro cards.

In addition, AMD has recently released custom display drivers for Autodesk’s AutoCAD and 3ds Max 2010 and 2011 – something I will discuss further in the benchmarking section of this article.

The cards on test

This April, AMD refreshed its ATI FirePro lineup with its newest series of GPUs, supporting all the newest 3D and GPGPU APIs: DirectX 11, OpenGL 4 and OpenCL 1. The five new cards we will be looking at here are the entry-level ATI FirePro V3800 and V4800, the mid-range V5800, and the high-end V7800 and V8800.

Entry-level: V3800 and V4800

V3800

The ATI FirePro V3800 is AMD’s entry point into its professional product line-up, and replaces 2008’s V3750. The biggest advantage the V3800 has over the V3750 is the addition of an additional 256MB of RAM, bringing the total to 512MB of DDR3 RAM.

AMD’s specs for this card include: 400 Stream processors for GPGPU computing, 14.4 GB/s memory bandwidth, less than 50W power consumption, and lastly, the ability to drive two 30″ displays with one dual-link DVI connector and one DisplayPort connector.

V4800

The V4800 is a step up from the V3800, but is still considered an entry-level card. With a slightly more advanced derivative of the Redwood GPU found on the V3800, the computing power of the V4800 is very similar to that of the V3800, sporting the same 400 Stream processors of the V3800.

What sets the V4800 apart from the V3800 is the full 1GB of RAM of the V4800 vs the 3800’s 512 MB. The V4800 also uses GDDR5 memory instead of the slower DDR3 used on the V3800. This gives the V4800 a tremendous advantage in memory bandwidth over the V3800.

Again, here are AMD’s specs for the V4800: 400 Stream processors for GPGPU computing, 57.6 GB/s memory bandwidth, less than 75W power consumption, and lastly, the ability to drive three 30″ displays with one dual-link DVI connector, and two DisplayPort connectors.

Mid-range: V5800

V5800

AMD’s mid-range line-up consists of a single product: the ATI FirePro V5800. The V5800 sports similar specs to the V4800: 1 GB of GDDR5 memory, sub-75W power draw, and the ability to drive three 30″ monitors with the same output configuration.

Where the V5800 differs is its core hardware. Using what AMD dubs the Juniper XT GPU, it has twice as many Stream processors as the V3800 and V4800 (800 on the V5800, 400 on the V3800 and V4800) and a higher clock speed.

Again, AMD’s specs for the V5800 are as follows: 800 Stream processors for GPGPU computing, 64 GB/s memory bandwidth, less than 75W power consumption, the ability to drive three 30″ displays with one dual-link DVI connector and two DisplayPort connectors.

High-end: V7800 and V8800

V7800

With the V7800, we enter the high end of AMD’s FirePro range – and the cards take a corresponding step up in power. The V7800 is interesting as it is a single-slot card, unlike most high-end GPUs, which use the more popular double-slot configuration to accommodate the card’s oversized heat sink. Having a single-slot layout is beneficial for those who wish to stack multiple cards to drive many displays, or for more GPGPU compute power. It is also beneficial for those who use systems built in smaller cases.

With 2GB of GDDR5 memory, and a GPU based on AMD’s top-of-the-line Cypress architecture, the performance potential of the V7800 is quite high. Here is a quick run-down of its specs: 1,440 Stream processors for GPGPU computing, 128 GB/s memory bandwidth, 138W power consumption, and the ability to drive three 30″ displays with one dual-link DVI connector and two DisplayPort connectors.

V8800

The second product in AMD’s high-end line-up is the V8800. Like the V7800, the V800 sports 2GB of GDDR5 memory and a derivative of the Cypress GPU, albeit a faster and more powerful one which demands a dual-slot configuration. The V8800 also drops the dual-link DVI and goes with four DisplayPort outputs, giving it the ability to drive up to four 30″ displays.

Specs for the V8800 are as follows: 1,600 Stream processors for GPGPU computing, 147.2 GB/s memory bandwidth, 208W power consumption, and the ability to drive four 30″ displays with four DisplayPort connectors.

Ultra-high-end: V9800

V9800

Lastly, I want to mention the ATI FirePro V9800. The V9800 is AMD’s most recent offering, falling into the ultra-high end of its range. GPU-wise, it is essentially a V8800 with double the RAM and two extra DisplayPort outputs allowing it to drive up to six 30″ displays.

The GPU is the same as the one used on the V8800, so I would expect viewport and display performance to fall in pretty close to the V8800, with maybe a slight edge due to the V9800’s 4GB RAM – something I believe will make the V9800 a potent GPU-compute performer once more extensive OpenCL applications become available. At time of writing, I do not have a V9800 to test, but I will return to it in a separate article.

Display connectors

DisplayPort quickly becoming the connector of choice for professional graphics cards for a number of good reasons, including physically smaller connectors, better signal integrity and the ability to go beyond the current resolutions for future super-high resolution displays. The only real downside to the industry’s embrace of DisplayPort is for those of us using monitors with no DisplayPort inputs.

This is where DisplayPort to DVI adapters are required. These work just fine, but you may need more of them than are provided with the cards as standard. For example, the V8800 ships with two DisplayPort to single-link DVI adapters: if you want to use four monitors with no DP inputs, you will need to purchase two more. The V7800, V5800 and V4800 all ship with just one adapter, and the V3800 does not ship with any.

While DisplayPort to single-link adapters are relatively inexpensive, the real problem comes for those of you with 30″ monitors that only have dual-link DVI inputs. In order for a DisplayPort output to get a maximum-resolution 2,560 x 1,600 signal to a DVI-equipped monitor, an ‘active’ DisplayPort to dual-link DVI adapter must be used. These are powered units that draw power from the host computer via a USB connection, so not only are they quite expensive compared to the single-link adapters, but require a free USB port for each adapter used.

This is rather unfortunate, since 30″ displays with DisplayPort connectors have only recently started to emerge – but sacrifices always have to be made in the name of progress, and AMD’s embrace of DisplayPort via its Eyefinity technology is definite progress towards the future for professional workstations.

Eyefinity technology

As I mentioned before, each card has the ability to drive multiple monitors: three in the case of the V4800, V5800 and V7800, four in the case of the V8800, and six for the V9800. This is all part of AMD’s Eyefinity technology. It is aimed at increasing productivity by enabling a single workstation to drive a large number of displays. AMD markets Eyefinity heavily towards large-scale CAD, scientific and presentation applications, but as CG Channel is geared mainly towards the DCC user, this is the angle from which I am going to approach this subject.

So do DCC users really need this many monitors? I have been using three and four-monitor configurations for a while now, and I can tell you that once you try it, you won’t want to go back to just two. Let’s say, for example, that you have a pair of 30″ displays, a 22″ display and a Wacom Cintiq. You plug them all into your V8800 or V9800 and… hey, check it out: you’ve got Max or Maya open one of the 30-inchers, Photoshop on the other, ZBrush or Mudbox running on the Cintiq, and your reference art or a web browser open on the 22-inch display! No more [Alt]-tabbing, and no more stacking windows so that only one or two are visible at the same time!

If you are really masochistic, you could even install four V9800s in a single system and drive 24 30″ displays – and yes, Windows 7 will support 24 displays! Having the ability to run multiple displays and be able to see everything simultaneously is a very enjoyable and productive way to work, and it is the perfect complement to today’s multi-core workstations.

GPGPU computing

Before I get into the benchmark results, I want to talk a little about GPGPU computing. This is the process by which the graphics card’s GPU is used to augment the system’s CPUs to perform general computing tasks. The potential of this technology is exciting, and we are just starting to see applications that make use of it.

This is where the professional graphics cards set themselves apart from their consumer counterparts. Remember that I said earlier that pro cards have more on-board RAM? Well, the more RAM on the card, the more intensive the computations that can be performed. Unless the tasks the GPU is trying to perform fit entirely within the on-board memory, data must be swapped between the RAM on the GPU and that of the workstation itself, making computation much slower.

For example, in the current crop of GPU-accelerated raytracers, the entire 3D scene must fit within the memory of the graphics card in order for the card to be used to help with the rendering process. If the scene is too big, the GPU will just ignore the render, and only the system’s CPUs will be used, resulting in much longer render times. This is where the FirePros’ 1, 2 and 4GB capacities come in handy.

I have no doubt that once applications that utilize OpenCL and DirectCompute (the GPGPU APIs that AMD supports) start to emerge, the FirePro line will have some serious horsepower to offer them. But at the moment, this is where Nvidia has the jump on AMD. While I have always found that, in general, AMD’s graphics cards offer better display and viewport performance in DCC applications, Nvidia has recently been pushing hard towards GPGPU computation with its proprietary CUDA SDK. As a result, while new CUDA-enabled applications are beginning to emerge, there are far fewer utilizing OpenCL.

For example, mental images’ iray and RandomControl’s Arion are both GPU-accelerated raytracers optimized only for CUDA-based systems. To my knowledge, there is currently no similar application that would be accelerated on an AMD GPU. (And if you’re thinking, “Hey, what about MachStudio Pro?” while MSP is a GPU-accelerated rendering package, the current version does not do raytracing or global illumination. It works solely off DirectX shaders.)

The CG industry is currently divided about the usability of GPU-accelerated raytracers. Some have embraced this technology as the future of rendering, while others simply do not feel that GPU renderers cannot live up to their CPU-based counterparts in terms of output quality. The latter viewpoint is evident in the fact that GPU-renderers are currently mainly used for previs and animatic work, while the final-quality rendering is relinquished to the software renderers.

I would have liked to include some GPGPU benchmarks in this review, but until we have appropriate software in which to perform the tests, it seems better to hold off. I have been told by AMD executives that there is a lot of exciting stuff in the works that will leverage the GPU-compute power of the FirePro cards, and all I can say is, hurry up guys: we need to put all those Stream processors to good use!

Testing procedure

All testing was done on an HP Z800 workstation sporting a pair of 6-core 32nm Xeon X5680 CPUs running at 3.33 GHz. Packed with 18GB of DDR3 memory and a 10,000 RPM Seagate SAS drive, this machine has plenty of horsepower to ensure that there are no bottlenecks on the graphics subsystems that might impede the benchmarking. (Those interested in this beast of a machine can find a review here.)

The system runs Windows 7 64-bit and testing was performed with several combinations of the following displays: an HP LP3065 30″ monitor at 2,560 x 1,600 resolution, a Gateway 24″ monitor at 1,920 x 1,200, and two Dell 2001FP displays at 1,600 x 1,200

Benchmark scores

3ds Max 2011

3ds Max is a unique piece of software in that it has three different display modes to choose from: DirectX, OpenGL and using AMD’s recently released performance driver. I have found that overall, Max runs best in DirectX mode, as its performance lead over OpenGL is significant.

As for AMD’s performance driver, it is quite fast when it works properly, but caused crashes during the tests. It seems to work well as long as you are not using any of Max’s advanced viewport shading features. In addition, materials from third-party renderers other than mental ray do not display properly. I have recently been informed that AMD’s engineers are aware these issues, and that they will be corrected with the next release.

As you can see from the scores above, the V8800 takes the crown here, followed closely by the V7800. The previous-generation V8750 takes the number three spot with the current V5800 closely following it, meaning that the current mid-range card is nipping at the heels of the previous generation’s ultra-high-end model.

Now here’s where things get interesting. In fifth and sixth positions, we have the entry-level V4800 and Nvidia’s previous-generation 4GB monster, the Quadro FX 5800, competing at nearly the same level, with each card bouncing back and forth between the two positions.

The same holds true for seventh and eighth place, with current entry-level FirePro, the V3800, running neck-and-neck with Nvidia’s previous-generation mid-range card, the FX 3800.

Maya 2011

Maya differs from 3ds Max in that it is built around just one graphics API, OpenGL, and it is highly optimized for it.

The results seen here for the Maya benchmarks are similar to those seen with the 3ds Max tests: the V8800 takes first place, closely followed by the V7800. Again, the V5800 and the V8750 trade blows for the number three and four spots. The Nvidia Quadro FX 5800 firmly takes fifth position, with the V4800 not too far behind. The Quadro FX 3800 takes seventh in two of the three tests, with the V3800 averaging eighth place.

Softimage 2011

Like Maya, Softimage is designed around OpenGL. However, unlike Maya and 3ds Max, Softimage’s viewport runs extremely fast, as you can see in the benchmarks below. So far, I have only put Softimage through its paces with untextured scenes (only smooth-shaded polygons: no textures or advanced shaders). Despite this, I would not expect viewport performance to degrade much when textures or materials are applied.

The ranking here is broadly in line with the previous benchmarks, with the V8800 taking the number one spot, the V7800 coming in second, and the V8750 taking a decisive third position. However, in this test, the V4800 takes the number four spot in both benchmarks and the V5800 comes in at number five. The Quadro FX 5800, Quadro FX 3800, and the FirePro V3800 fight it out for the sixth, seventh and eighth spots.

Mudbox 2011

Like Maya and Softimage, Autodesk’s sculpting package uses OpenGL as its display technology of choice. With Mudbox, the professional graphics cards really set themselves apart from the consumer cards, as the software likes having lots of RAM on the video card.

Unlike traditional DCC apps, Mudbox works with very high poly counts (average scenes can run anywhere from 8 million polygons up to the highest I have ever seen, 135 million. In contrast, standard DCC apps usually run to 2-5 million polygons for complex scene files.)

Some of you may ask why I have not included any ZBrush benchmarks here. The answer is simple: Mudbox’s viewport performance is dependent on the graphics card. ZBrush, on the other hand, uses Pixologic’s proprietary CPU-based technology to render the viewports, so the installed graphics card really has no impact on performance. Since we are testing graphics cards, there is little point in running benchmarks for software that doesn’t tax the graphics subsystem.

For the Mudbox tests, each card was evaluated in two different ways. The first is a measure of overall viewport performance while panning, rotating or zooming the model, while the second measures the software’s response while sculpting.

Again, the V8800 leads the V7800, although the difference in performance is relatively small. But unlike the previous tests, the Quadro FX 5800 comes in third place in viewport performance, and actually outperforms both the V7800 and V8800 at editing. The V8750 comes a clear fourth; the V4800, V5800 and Quadro FX 3800 share fifth place; and the V3800 is left in eighth, due to its lower viewport performance.

MachStudio Pro

MachStudio Pro is a unique piece of software from a relative newcomer to the CG industry, StudioGPU. It is a standalone scene-assembly application that was one of the first on the market to leverage the GPU to perform final beauty-pass rendering.

However, unlike the few GPU-accelerated raytracers out there, MachStudio Pro does not leverage OpenCL, DirectCompute or CUDA to perform these tasks, so it is not doing GPGPU-compute tasks in the traditional sense. Instead it uses technology based on advanced DirectX-based pixel shaders to achieve its final render output, and it does it in near real time. The software is quite promising, and I will be looking at the newest version here in the near future.

Once again, we have the V8800 and V7800 in the lead, tying for first place. The V8750 comes in third. Nvidia’s Quadro FX 5800 moves up the ladder to the number four spot, with the FirePro V5800 and the Quadro FX 3800 roughly tying for fifth place. The FirePro V4800 takes seventh, and the V3800 brings up the rear at number eight.

Cinebench 11.5

Anyone who has read any of my previous reviews here on CG Channel knows that I am not a big fan of synthetic benchmarks, as they offer no real insight as to how a particular hardware set-up will perform in production. This is not the fault of the engineers who write these benchmarks: it’s just that there are too many variables to account for to be able to predict accurately how any piece of hardware will perform in all types of production.

Having said that, I have had some requests to include Cinebench tests in my various reviews – so congratulations, Cinebench: you have the honor of being the only synthetic benchmark included in these tests.

The benchmark scores above are self-explanatory: it is really only necessary to note that the Cinebench results are broadly in line with those of the previous application-specific tests.

Pricing (as of 6 December 2010)

Overall verdict

Before I sign off, I want to give my overall impressions of each of the new FirePro cards. First off, we have the mighty V8800. This is quite the monster, and I have found very few situations where I felt I needed more speed out of the graphics hardware. The most appealing aspect of the V8800 is the ability to drive four displays from a single card, with little to no slowdown when running multiple 3D apps across the four monitors.

The V8800 even handles gaming quite well. Let’s face it, we all need a break from our work every so often, so I would throw up the occasional round of StarCraft II, and on the 30” display, with all settings maxed out except for full-screen AA, I was getting over 30fps more than 90% of the time.

Next, we have the V7800. I was very impressed with the V7800: in the benchmarks above, it came in only slightly behind the V8800, and in overall system performance, it really didn’t seem to be any slower than its big brother. The biggest difference is that the V8800 can drive four monitors, while the V7800 can only run three.

However, I imagine that when we start to see some production-ready OpenCL and DirectCompute apps, the performance differences between the V7800 and V8800 will become more apparent: the V8800 has more Stream processors than the V7800, which should give it an edge with GPGPU applications.

Moving on down the list, we have the V5800. I think the V5800 is the card that will appeal to most DCC professionals, including modelers, texture artists and animators. The card performs quite well, as long as the content you are working on does not exceed the 1GB of on-board RAM, but once it does, performance degrades significantly. However, unless you are assembling large scenes with many high-resolution assets, or working with super-dense Mudbox meshes, 1GB should be more than adequate. Add the ability to drive three monitors, and you’ve got a great balance of performance, features and price.

The only real unknown here is the V5800’s GPGPU performance. While it will undoubtedly be less powerful than the V8800 and V7800, just how much so is uncertain without any real OpenCL or DirectCompute benchmarks. This is something we will have to re-examine later down the line.

Next, we come to the V4800. The V4800 is an interesting card as it performs almost as well as (and in one case better than) the V5800 in standard DCC applications. Where performance starts to fall off is when viewport shaders are used heavily, or when sculpting high-density meshes in Mudbox.

Also of note is the fact that when running a three-monitor set-up, the V4800 slows down noticeably when running more than one 3D application: something the V5800 does not suffer from until the card’s 1GB of memory is exceeded.

Last, we have the entry-level V3800. From a performance standpoint, the V3800 does a decent job keeping up with its bigger siblings when working on light-to-moderate scenes. However, its performance does drop significantly when heavier scenes are loaded, and when shader-intensive tasks are being performed. Its 512MB of RAM also limits it when using digital sculpting tools as even light-to-moderate scenes can easily exceed 512MB, and once that happens, performance degrades significantly.

I can see the V3800 being a good entry-level AutoCAD or SolidWorks card as CAD projects tend to not use the high-resolution textures and complex pixel shaders that DCC apps do, making the V3800’s smaller memory pool and lower-power GPU less of an issue.

In conclusion

AMD has quite the powerful line-up of professional graphics cards, from high-end monsters all the way down to its entry-level contenders. I have always found its graphics offerings run extremely well and stably with DCC applications and, in my opinion, the ability to run three or more monitors on a single card through Eyefinity technology gives AMD a superior productivity tool.

The current line-up of FirePros offers excellent performance and a set of features that makes the cards stand out from their competitors. The only thing missing is more OpenCL and DirectCompute-ready applications so that we can see how AMD’s FirePros stack up to Nvidia’s Fermi architecture in the GPGPU arena.

And for those of you looking for a detailed comparison between the current FirePro products and Nvidia’s Fermi-based Quadro Cards, stay tuned: it’s coming soon!

For more information on AMD’s line of professional products, visit the company’s website.

Acknowledgements

I’d like to give a special thanks to several vendors and individuals for their contributions to this article:

Vendors:

HP, Autodesk, Evermotion

Individuals:

John Swinimer and Evan Groenke of AMD; Sierra Lovelace of Bite Communications; Dan Platt; Stephan Dube