HP’s new top-of-the-range Z820 workstation offers four more CPU cores than its predecessor. But as Jason Lewis’s benchmarks reveal, whether that extra power will benefit you depends on the kind of CG work you do.
With summer over, what better way for us CG professionals and enthusiasts to head into fall than with a look at an extreme high-end desktop workstation that will (warning, mild spoiler ahead) chew through some of the most demanding CG projects that you can throw at it?
Today, we are looking at HP’s top-of-the-line Z820 workstation. I have been benchmarking HP workstations since it introduced its current Z-series systems, of which the new Z820 is part of the third generation.
Why not review a tablet?
Before we get into this review, I would like to address a question that I am often asked. With tablet and smartphone adoption growing rapidly, I’m sure that many of you have seen articles yelling that the era of the desktop PC is over. With all due respect to my fellow journalists, this is not so.
Yes, we are seeing a major transition in the consumer space from the traditional desktop and laptop systems to tablets and smartphones as primary content consumption devices – but the key word there is ‘consumption’.
There are still many computing tasks that require traditional high-performance hardware, including server or data center computing, gaming and content creation. In addition to writing for CG Channel, I work as a CG artist for one of the largest game publishers in the world, and we have no plans to replace any of our high-performance workstations and large desktop displays with iPads or Galaxy Tabs any time soon.
With current-generation hardware, CG work would simply become too slow and cumbersome on anything but desktop systems or high-end laptop systems, and large desktop displays are equally essential. So fear not: all those millions of iOS and Android users still need workstation users to create the content they are consuming at such a frantic pace.
The increasing CPU core count
Back in 2009, the first Z800 sported a pair of quad-core Xeon CPUs running at 3.2GHz, and 18GB of DDR3 memory; while the refreshed Z800, released a year later, featured a pair of six-core CPUs, running at 3.33GHz. The configuration of the Z820 we will be looking at offers a pair of 3.1GHz eight-core CPUs running at 3.1GHz, and 32GB DDR3 memory.
As you can see, with each iteration, HP has been primarily pushing two aspects of these high-end systems: memory bandwidth and raw CPU processing power. Multi-core processors revolutionized desktop computing when they began to see mainstream adoption midway through the last decade. Beginning as dual-core chips, the core count rapidly increased, and now high-end models offer eight or even twelve cores on a single CPU.
While the effect of adding more cores is significant in the consumer space, it is in professional workstations that we see the biggest performance gains. Prior to multi-core CPUs, the only way to get more than one CPU in a machine was to use several single-core CPUs. But graphics software is highly optimized for multiple GPUs, and has been for some time. Back in the late 1990s and early 2000s, rendering software was one of the few markets to take advantage of then-rare two or four-CPU machines.
Unfortunately, outside of 3D, simulation and rendering, not a lot of applications are designed to make efficient use of more than four cores. In addition, six-core chips are already pushing the thermal boundaries of CPU architecture, and even the transition from 32nm to 22nm parts may not help much. As overclockers are learning, with some of the new 22nm parts, the die is so small that you simply cannot get enough of a contact surface between it and the spreader to adequately dissipate the heat it generates. Unless Intel and AMD can come up with a more efficient way to channel heat, I believe we may see CPUs top out at 12 or 16 cores.
In previous workstation reviews, we have seen how performance of CG applications scales almost linearly with CPU core count, with only slight diminishing returns observable from 4 to 12 cores. In this review, we will see if the Z820, with its 16 CPU cores, can continue the trend.
But before we get into the benchmarking, let’s take a look at the system itself. From the outside, you would be hard pressed to tell the difference between the Z820 and the Z800 it replaces. The dimensions are the same (17.5 x 8.0 x 20.7 inches, or 44.4 x 20.3 x 52.5 cm), and it still sports the same black, ribbed front, with high-quality brushed aluminum side panels.
Aside from the small Z820 label in the upper right-hand corner of the front panel, the only external difference is that the Z800 had a slot-load DVD drive, while the Z820 has traditional bay-mounted optical media drives: in the case of our test system, a Blu-ray drive.
As with the Z800, the external connectivity is very good. On the back port, there are four USB 2.0 ports, two USB 3.0 ports, a FireWire 1394a port, two Gigabit Ethernet ports, a serial port, audio jacks and PS/2 mouse and keybord ports. On the front, there are two USB 3.0 ports, a USB 2.0 port, a FireWire 1394a port and mic and headphone jacks.
Inside the case, the layout of the internal components remains very similar to that of the Z800. The large power supply runs across the top of the system, while the drive bay assembly occupies the lower front. This consists of four 3.5? drive bays, each with its own quick release pull-out tray and dedicated fan. There are also three 5.25? bays that can take DVD or Blu-ray drives, a multi-card reader or, with an adaptor, more 3.5? internal hard drives. PCI Express slots for graphics and other expansion devices sit behind the hard drive trays, and the center of the case holds the CPUs and RAM.
The biggest internal difference between the Z820 and the Z800 is that while both systems have a large plastic shroud that covers the CPUs and memory, the Z820′s can be removed in one piece, meaning that – as with the rest of its tool-less assembly system – key components can be swapped out without the need for a screwdriver.
Overall, the internal layout of the Z820 is clean and uncluttered, and provides good airflow for cooling purposes. It represents an evolution of the Z800, not a drastic redesign: but if it ain’t broke, why fix it?
As I mentioned earlier, the Z820 significantly improves on the Z800 when it comes to processing power and memory bandwidth. Our previous-generation test machine was equipped with two Xeon 5680 six-core CPUs based on Intel’s previous generation Westmere-EP architecture running at a clock speed of 3.33 GHz. Each processor had access to three memory channels with two memory slots each, for a total of 12 DIMM slots and a maximum of 192GB of RAM.
The new test system came equipped with two of Intel’s new Xeon E5-2687W eight-core CPUs, based on the 32nm Sandy Bridge architecture. They run at a clock speed of 3.1GHz (3.8GHz in turbo mode when running lightly threaded applications) and support Hyper-threading, so a 16-core system can process up to 32 threads.
Things get quite a bump in the memory department as well. The Intel C602 chipset motherboard supports four memory channels per processor with 2 DIMM slots per channel for a grand total of 16 memory slots and up to 512GB of memory. Our test system wasn’t quite as powerful, running 32GB of DDR3 memory arranged in eight 4GB DIMMs.
In addition to the 16 RAM slots, the C602-based motherboard provides a vast array of PCI Express slots: three Gen3 x16 slots, one Gen3 x8 in a x16 slot, one Gen3 x4 in a x8 slot, one Gen 2 x4 in a x8 slot and a legacy PCI slot. For internal storage, the board offers a two-channel 6GB/s SATA drive controller, a four-channel 3GB/s SATA controller and an eight-channel 6GB/s SAS controller. Expandability is one of the key selling points of the Z820 over its smaller sibling, the Z620, which is also a dual-socket workstation, but only offers six slots for add-on cards, and five drive bays to the Z820′s seven.
In addition to the Z820′s two eight-core Intel Xeon E5-2687W CPUs, running at a clock speed of 3.1GHz, and 32GB of DDR3 memory, our test system used a Nvidia Quadro 5000 GPU. Storage consisted of a 300GB 15,000 RPM SAS drive for the main system drive, and a 1 TB 7200 RPM SATA drive for the storage drive.
For comparison, we also tested the following systems:
A previous-generation HP Z800 with two six-core Xeon X5680 CPUs running at 3.33GHz, 18GB of RAM, and a Nvidia Quadro 6000 GPU.
An entry-level HP Z210 with a quad-core Xeon E3-1270 CPU running at 3.4 GHz, 8GB of RAM and an AMD FirePro V7900 GPU.
A coach-built system with an Intel Core i7-980 six-core CPU running at 3.3GHz, 12GB of RAM, and a Nvidia Quadro 4000 GPU.
A coach-built system with an Intel Core 2 Quad Q9550 quad-core CPU running at 3.0GHz, 8GB of RAM, and an AMD FirePro V8800 graphics card.
The CPU in the latter dates from 2008, but it’s still a very capable system for running high-end DCC applications, and illustrates just how far processing has come in the past four years.
Please also note that the GPUs used in the Z210 and Core i7-980 test machines are different to those used in previous reviews. This has a significant impact on benchmark scores.
For benchmarking, we used the following standard suite of DCC and rendering applications:
3ds Max 2012, Maya 2012, Softimage 2012, Mudbox 2012, ZBrush 4, Premiere Pro CS5.5, Fusion 6.2 LE, RealFlow 2012
mental ray 3.9, V-Ray 2.0, Brazil r/s 2.0, Maxwell Render 2.5
iray (3ds Max 2013)
All of our test systems were running Windows 7 Professional 64-bit with all the latest service packs and updates, and the benchmarks were recorded on a HP LP3065 30″ LCD display, running at its native resolution of 2,560 x 1,600.
3ds Max 2012
First up, we have the 2012 version of Autodesk’s 3ds Max modeling, rendering and animation software. Together with Maya, it probably makes up a little over 80% of the DCC market: an estimate based both on information from Autodesk and my own anecdotal experience.
Autodesk is currently running the Excalibur (XBR) project: an ongoing overhaul of 3ds Max’s core code that began with the 2010 release. One of the aims is to incorporate multi-threading throughout the software, not just for rendering, so we were interested to test the benefit of stepping up from 12 to 16 CPU cores.
The following benchmarks show average viewport frame rates for rotating, panning, vertex and face editing for each model displayed. All were performed with the Nitrous viewport?s Realistic shading mode.
As you can see, viewport performance relies more heavily on the graphics card than it does on the CPUs: on the Audi A5 scene, the Z820 with its Quadro 5000 takes second place behind the Z800 and its Quadro 6000; and on the Steampunk Tank scene, it takes third place behind both the Z800 and the Z210 and its FirePro V900.
Subjectively, however, the Z820 does seem more responsive when doing any kind of sub-object manipulation, or applying modifiers that involve updates to the geometry (Bend, Twist, Taper, FFDs and so on) on objects with high polygon counts or in scenes with lots of objects.
Like 3ds Max, Maya is one of the most widely used 3D applications in film, television and videogame development. Unlike 3ds Max, its core does not seem to be multi-threaded: only the rendering, dynamics and simulations. (This conclusion is based on observing CPU load monitors while working with the software: if anyone has information to the contrary, I’d be interested to hear it.)
Despite this, the Z820′s CPUs should still offer a performance boost: although the clock speed of its Xeon E5-2687Ws is 200MHz slower than the Z800′s Xeon 5680s, tests conducted by other hardware sites suggests that its Sandy Bridge architecture offers a performance increase of 10-20% over the older Westmere-EP architecture.
As with the 3ds Max benchmark, the Maya benchmark is also comprised of averaged viewport frame rates for rotating, panning, vertex and face editing for each of the models shown.
Once again, viewport performance is much more reliant on the graphics card than the rest of the system: the Z820 places third or fourth in these tests, sometimes even coming in behind the weakest system, the Q9550. (This is probably due to the Q9550′s FirePro card: if you recall last year’s professional GPU shootout, you will remember that Maya tends to favor AMD cards over Nvidia cards.)
As with 3ds Max, manipulating high-poly models ‘feels’ smoother on the Z820 than any of the other test systems, but there aren’t any hard numbers to back that up.
Maya 2012 cloth dynamics
Next we have a cloth simulation done with Maya 2012?s cloth dynamics tools. The simulation is a basic flag with a Gravity and Wind force applied to it, simulated over 140 frames of animation.
Here, the Z820 takes the top spot. I suspect the biggest contributor here is the new Xeon’s Sandy Bridge architecture, as the simulation does not seem to stress all of the available CPU cores in the Z820 system.
Softimage is the third DCC application in Autodesk?s line-up. It is a full-featured 3D modeling, animation and rendering package. Again, the benchmark consists of averaged frame rates for viewport rotating, panning, vertex and face editing for the scene shown.
Like Maya, Softimage seems to favor AMD graphics cards over Nvidia cards. The Z820 comes in fourth, while the Z800 and its powerful Quadro 6000 only manages third place, behind the Q9550 and its FirePro V8800, and the Z210 and its FirePro V7900.
The last piece of Autodesk software used for our benchmarks is Mudbox 2012. Unlike 3ds Max, Maya, and Softimage, Mudbox is a digital sculpting program similar to ZBrush. Sculpting applications tend to be optimised to display much higher polygon counts than their traditional DCC cousins. With this benchmark, we will look at simple viewport performance, as well as sculpting performance.
Mudbox yields both expected and unexpected results. The viewport manipulation benchmark is as expected, with the Z820’s Quadro 5000 taking second place behind the Z800 and its Quadro 6000; but with the sculpting test, the Z820’s Xeon CPUs really show their muscle.
ZBrush, like Mudbox, is a digital sculpting application. But whereas Mudbox taxes both the CPU and GPU, ZBrush only uses the CPU to render geometry to screen. Unfortunately, there is no way to view or record frame rate in the ZBrush viewport while sculpting an object (at least, not one that I am aware of: if you know of one, please post it in the comments to the review). Fraps will not give a readout in ZBrush, possibly because ZBrush uses its own rendering system rather than OpenGL or DirectX, so Fraps doesn’t see it as a 3D application.
All I can really do here is convey my subjective experience of using ZBrush – which is that it works well on almost any modern system featuring four or more processor cores and at least 4GB of RAM. (ZBrush is still a 32-bit application, so it cannot access more than 4GB of available RAM: the 32GB in the Z820 is overkill.) It feels great on the Z820, but that’s true of all the systems on test: even the Q9550 feels only slightly slower.
Premiere Pro CS5.5
Premiere Pro is a video editing package from Adobe that is very popular in the professional DCC market. For this test we are encoding a 123-second HD 1080p video clip in the H.264 video format.
Fusion 6.2 LE
Fusion is a node-based compositing application. For this benchmark, we have a moderately complex composition that is 141 frames long, rendered at HD 720 resolution.
The Z820 beats the older Z800, but takes second place to the Z210. Since Fusion doesn’t seem to scale beyond four CPU cores, I suspect that the higher clock speed of the Z210′s Xeon E3-1270 is what enables it to pull ahead here.
Next up is a new benchmark using RealFlow 2012: a hybrid grid/particle-based fluid simulation package that has become popular in commercials, broadcast and movie work. Our benchmark consists of a 700-frame simulation with one emitter and one collision object. The particle count tops out at 1.6 million.
The Z820 takes first place, followed by the Z800, then the Core i7-980 in third, the Z210 fourth, and the Q9550 fifth. RealFlow is a multi-threaded application, but even though the information panel says that it is using 32 threads on the Z820 when Hyper-threading is enabled, observation of the Task Manager suggests that it tops out at six cores in this test. Like other lightly threaded apps, the performance victory for the Z820 can probably be attributed to the new Sandy Bridge architecture found in the new Xeon E5-2687W CPUs.
Developed by Nvidia, iray is one of the most popular GPU-accelerated renderers on the market. It is integrated into 3ds Max 2012 and 2013. This test makes use of Nvidia’s new material plugin, which allows iray to simulate subsurface scattering, metallic flakes and thin film coatings. It is only compatible with 3ds Max 2013, so unlike the 3ds Max benchmark itself, I am using 2013.
Since iray uses the system’s GPU to accelerate rendering, the system with the Quadro 6000 is the one that wins. As you can see above, the Z800 beats the Z820 and its Quadro 5000 narrowly when it is using the Quadro 6000; but if we swap the cards, then the Z820 takes a large lead. The Core i7-980 system comes in a long way behind both; and since iray only supports Nvidia GPUs, we could not test the other two systems at all.
mental ray 3.9
Our first CPU-only rendering benchmark is mental ray 3.9. Owned by Nvidia, it makes up a large percentage of the rendering market for entertainment and visualisation.
Rendering software is one of the few types of applications that is heavily threaded and therefore highly optimized for multiple CPUs or CPU cores. Here the Z820 takes a commanding lead over the previous-generation Z800. We tested a range of scenes (not all shown here), with the Z820 rendering them between 33% faster (the Classroom) and 196% faster (the Hot Rod).
I think it is safe to say that the performance increases here are due entirely to the Z820′s four additional CPU cores and new CPU architecture, not the fact that it has more RAM: none of the mental ray tests exceeded 5GB of RAM usage, and even taking Windows and other software into account, total usage stayed well below the 18GB of the Z800.
V-Ray is a third-party renderer for 3ds Max, Maya, Softimage, SketchUp, Rhino and Cinema 4D. Used primarily for visualisation work, it has recently been making inroads into visual effects, although it does not yet have the same market share there as RenderMan or mental ray. Although V-Ray has a GPU-accelerated preview renderer, we’re using the CPU alone here.
As with the mental ray benchmarks, the Z820 significantly outperforms the Z800. Once more, this performance boost varied from scene to scene, with the lowest speed boost being the Evermotion exterior (47% faster) and the highest being the Steampunk Tank (109% faster). Again, four extra CPU cores and the faster Sandy Bridge architecture give the Z820 a significant performance lead over the Z800, and a simply staggering one over the single-socket systems.
Maxwell Render 2.5
Maxwell Render was one of the first commercial renderers based on unbiased rendering technology. Unbiased renderers have the advantage of a simplified user experience and realistic output, but are significantly slower than their Reyes-based counterparts. Again, it uses the CPU.
Brazil was one of the first GI renderers to become commercially available. Its current owner, Imagination Technologies recently announced the end of life for the software in its present form, but it remains in our benchmark tests for the time being. Once again, it is CPU-based.
But Brazil r/s has a unique feature: you can set the number of threads the software will spawn manually. Earlier on in this review, I talked about how 3D performance scales almost linearly with number of CPU cores: next, we test that claim. For this benchmark, I have disabled Hyper-threading so that we know a whole CPU core is assigned to each thread spawned.
The effect of Hyper-threading
Next, I want to take a look at the effects of Hyper-threading on rendering performance. Hyper-threading is a feature of Intel CPUs that enables one CPU core to handle two threads at the same time. Enabling it essentially makes Windows think that there are twice as many CPU cores in the host system as there really are. Even today, many applications do not tax even a single CPU core fully, so Hyper-threading was designed to enable the system to make use of that remaining processing power. However, rendering packages have traditionally been designed to make full use of all available CPU cores. So is there any benefit to enabling Hyper-threading?
As you can see, even for rendering, Hyper-threading does indeed offer performance benefits: slight in the case of the mental ray Hot Rod benchmark; moderate with the mental ray Classroom render; and significant with the V-Ray Light Cycles scene.
I have read reviews that suggest that Hyper-threading offers unpredictable results, and in some cases a performance penalty, but these cases seem to be few and far between. In my own personal experience, if you leave Hyper-threading enabled, you will see performance gains in the vast majority of situations.
Those of you who are familiar with my previous reviews will know that I am not a fan of the synthetic benchmarks that so many other reviewers rely on. This is simply due to the fact that the results they generate do not reflect the performance you would find in real production situations. Despite this, I have had requests from readers to include Cinebench results, so it is the one synthetic benchmark I include. You can download it here.
Next, let’s take a look at some other important characteristics. The Z820′s BIOS has seven different fan speed settings, and this is largely what determines its acoustic performance. At the lowest setting, the Z820 hardly makes any noise at all, but the CPUs get very hot when running under full load while rendering, topping out at almost 90°C. I would be uncomfortable running the CPUs for extended periods of time at these temperatures.
Conversely, if you set the fan speed to maximum, the CPUs stay at a chilly 42-45°C under full rendering load, but the system becomes quite loud: too loud for creative work, I would suggest, although this setting could be used in a data center.
Settings 3 and 4 seem to offer the best compromise: I typically use setting 3. At this speed, the noise of the fans is noticeable, but much quieter than many high-end SLI gaming systems; and thermal performance is decent, with the CPUs idling at 43-45°C and topping out at 62-63°C, well below the thermal danger zone of 85-90°C .
It would have been nice if HP had included a utility to adjust fan speeds from within Windows instead of having to reboot to access the BIOS. There are third-party utilities out there that can do this, but none of the ones I’m aware of work with the Z820′s fans.
Next, let’s look at the Z820′s power draw. With the ever-increasing costs of electricity, power usage has become a very real concern when choosing a new workstation. While it is true that throwing more and faster hardware at a system will increase its performance, this also increases its power usage, and when you have an animation studio filled with hundreds of workstations, the difference between a system that draws 100W of power and one that draws 500W can mean thousands of dollars in electricity bills per year. Even for individuals, power usage can make a noticeable difference if you leave your system on all night, or even if you just use it frequently.
Our Z820 test system is equipped with a 90% efficient 1125W power supply. Let’s see how it stacks up.
With an extra 12GB of RAM and four extra CPU cores, it is to be expected that the Z820 would draw more power than the Z800. However, the Z820 does draw less power with applications that tax the GPU over the CPUs, since the Z800 is equipped with a power-hungry Quadro 6000.
I would expect to see a decrease in power usage once workstations using Intel’s new 22nm Ivy Bridge architecture become available, but I don’t expect that to be until next year at the earliest.
The last thing I want to talk about here is pricing. Workstations typically use premium components, and with premium components come a premium price. While the Z820 starts at $2,299 fully loaded (as in the case of our test system) its price can easily hit the five-figure mark.
While the typical PC enthusiast may cringe at that figure, you need to look at who the Z820 system is intended for. Modern films and games have budgets that range from millions to hundreds of millions of dollars. Purchasing a handful of workstations in the $8-12,000 range is a tiny fraction of that budget and with development and shooting schedules are becoming shorter every year, the phrase ‘time is money’ can be interpreted quite literally in the DCC market. In many cases, if a visual effects studio runs over deadline, it has to fund the remaining work itself.
High prices keep high-end hardware out of the hands of CG hobbyists, but this has been the case for many years now, and unfortunately, I don’t see it changing any time soon. The Z820 certainly carries a premium price tag – but is it worth that cost? Many professionals, myself included, would say yes.
What should you use a Z820 for?
I have spent the last few months with the HP Z820 system reviewed here, integrating it into my regular work pipeline. Overall, it is the fastest workstation I have ever used. It packs some of the highest-performance components available today into an efficient, well-designed package.
But do you need a workstation with 32GB of RAM and 16 CPU cores capable of executing 32 threads simultaneously? It depends on what kind of tasks you are going to throw at it. If you are looking for a system on which to do asset creation – modeling, texture painting, and so on – or if you are purely an animator or compositor, this dual-socket beast is probably overkill. In these benchmark tests, the speed of the GPU outweighed raw CPU processing power.
Instead, the real benefit of the Z820′s Sandy Bridge CPU architecture and four extra CPU cores comes when rendering. Lighting artists, shader artists, TDs and the people who do final scene layout and composition will all benefit from its high throughput; as will those who work in architectural or product visualisation.
Another good fit for the Z820 is for crunching mocap data. A couple of years ago, I had a hands-on session with a real-time mocap studio similar to the one used for Avatar, which used a pair Z800s to process the incoming data in real time, so it seems a safe bet that the Z820 would also be well suited for this role.
Artists who specialize in simulations could also benefit from a system like this: even though individual simulations do not fully tax all 16 CPU cores, the extra capacity enables you to run multiple simulations at once, potentially in different software packages.
Similarly, if you are a 3D generalist, the Z820 enables you to maintain a high level of efficiency in your pipeline. You could be working on assets while rendering in the background (with low-priority threads), or have multiple simulations or video streams encoding simultaneously.
Another area in which the Z820 would fit well is in a render farm. Since it has nearly four times the rendering power of a high-end quad-core desktop system, your farm would be much smaller and easier to use – and use a lot less electricity – than if it were comprised of less expensive single-slot four or six-core systems.
Put simply, the Z820 workstation is an impressive system. It combines some of the highest-performance components available today into a well-thought-out case, and has a build quality that is second to none. If you are looking for a workstation to handle some of the most extreme computing tasks in DCC work, look no further.
In my next review, I will be looking a similarly specificed workstation from HP’s main competitor: Dell. Thanks again, and stay tuned!
Jason Lewis has over a decade of experience in the 3D industry. He is currently Senior Background Artist at Electronic Arts and CG Channel’s regular technical reviewer. Contact him at jason [at] cgchannel [dot] com
I would like to thank the following individuals and vendors who contributed to this article:
Tags: 3d, 3ds max, benchmark, cg, CineBench, DCC, desktop, HP, Hyper-threading, iray, Jason Lewis, Maxwell Render, Maya, mental ray, Mudbox, PC, premiere, RealFlow, Review, softimage, V-Ray, workstation, Z800, Z820, ZBrush