Graphics Cards

AMD Engineer Comments On Radeon HD 5870 Performance


AMD has, unofficially, responded to the somewhat disappointing performance of the new Radeon HD 5870. Below is reproduced the post by user "CarrelK", which is apparently Carrel Kilbrew, the lead engineer of the excellent RV770 design. Carrel has a good relationship with AnandTech lately, as you can see from this story. With no more delays, the post(top of page 17):

If a game ran almost entirely on the GPU, the scaling would be more of what you expect. You can put in a new GPU, but the CPU is no faster, main memory is no faster of bigger, the hard disk is no faster, PCIE is no faster, etc.

The game code itself also limits scaling. For example the texture size can exceed the card's memory footprint, which results in performance sapping texture swaps. Each game introduces different bottlenecks (we can't solve them all).

We do our best to get linear scaling, but the fact is that we address less than a third of the game ecosystem. That we do better than 33% out of a possible 100% improvement is I think a testimony to our engineers.

The card isn't horrible but giving it 1600 shaders and not enough bandwidth isn't really something worth praise. You have a decent card, not one as balanced as the previous ones and people, me included, were expecting a similarly well engineered card as the RV770. But I diverge, let's have a look at something:


Does it look like the test platform needs a new CPU, faster memory, hard-disk of faster PCI-Express slots? No, it doesn't. It still can go up to almost 35 FPS with an additional card. Extra memory? No, CrossFire doesn't double the framebuffer, you have 1GiB per card but it still only counts as 1GiB for the game.
If the Radeon HD 5870 could've really deliver double the performance, it should have at least went to 34.8 FPS, which is still less than double of what the HD 4890 can deliver and with less CPU/driver overhead from the second card. That kind of slight difference wouldn't surprise me, 40-60% of missing performance does.
What I was expecting to read but didn't were the words: "the driver is still immature". Well, we probably won't be seeing big boosts until they actually strap 6 GHz GDDR5 to the thing.

The card would be excellent if it would've fit the same $199/$299 bracket as before but it's not so surprising at $259/$379. The price was well known to be higher than $299 before the launch and that brought some higher expectations about performance. The magic die size target was missed and that messed plans for AMD, which still has to charge more for the bigger die, regardless of the actual performance.

Graphics Cards

Radeon HD 5870 Reviews



NDAs up are up, reviews are plenty and the Radeon 5870 leaves too little good thoughts in my mind.

Summing the reviews up, for the impatient:

Performance you might have already read about, it's quite similar to AMD's internal testing, although deflated by 20%, averaging about 40% better performance than the GTX285, GTX 275 and Radeon HD 4890.

If you want the a great gaming rig without looking at cost, go for two Radeon HD 5870 in CrossFire(it's a lot more interesting than Triple SLI for lots of reasons). If you are more sensible or don't have the money, wait for the Radeon 5850 or go for the cheaper Radeon HD 4890 while giving up DX11 support for a while. If any of the new features interest you - and DX11 should - you really have to wait for the Radeon HD 5850, which is expected next week. $129 more for a card that won't deliver 50% more performance isn't a very good deal.

Eyeffinity is an interesting technolgy, exciting even, but the issue of cost and cheap monitors quality is too big IMO. For enthusiasts that should move you towards AMD's cards despite what Nvidia currently has on the market and what may come ahead. There's a strong issue when using three monitors, as you can only use two DVIs and the DisplayPort output, which needs either an expensive DP monitor or an active adapter that may cost more than $100.

Let's dwelve deeper now, into other problems that spoil an otherwise good GPU:

Power consumption and cooling

There's a problem with Furmark/OCCT and AMD cards, mentioned by Ryan Smith at AnandTech, which affects the RV770 also:
That problem reared its head a lot for the RV770 in particular, with the rise in popularity of stress testing programs like FurMark and OCCT. Although stress testers on the CPU side are nothing new, FurMark and OCCT heralded a new generation of GPU stress testers that were extremely effective in generating a maximum load. Unfortunately for RV770, the maximum possible load and the TDP are pretty far apart, which becomes a problem since the VRMs used in a card only need to be spec’d to meet the TDP of a card plus some safety room. They don’t need to be able to meet whatever the true maximum load of a card can be, as it should never happen.
"We never thought those 160 5-way shaders would actually be put good use, but we're calling it 800 shaders the same so we can have us some marketing uphand!"
Why is this? AMD believes that the instruction streams generated by OCCT and FurMark are entirely unrealistic.
"Madness, we say!"
They try to hit everything at once, and this is something that they don’t believe a game or even a GPGPU application would ever do.
Marketing, ya know. Like "Prescott", remember?
For this reason these programs are held in low regard by AMD, and in our discussions with them they referred to them as “power viruses”, a term that’s normally associated with malware. We don’t agree with the terminology, but in our testing we can’t disagree with AMD about the realism of their load – we can’t find anything that generates the same kind of loads as OCCT and FurMark.
Regardless of what AMD wants to call these stress testers, there was a real problem when they were run on RV770. The overcurrent situation they created was too much for the VRMs on many cards, and as a failsafe these cards would shut down to protect the VRMs. At a user level shutting down like this isn’t a very helpful failsafe mode. At a hardware level shutting down like this isn’t enough to protect the VRMs in all situations. Ultimately these programs were capable of permanently damaging RV770 cards, and AMD needed to do something about it.
"We gave in to the amount of shaders the marketing team wanted and now we're having to replace dead cards because we ignored the issue. We forgot those nasty overclockers. We just tricked them and now we're pissed at them because they've put our claim to the test. VRMs are expensive, you know?"

The solution from AMD this issue with "too efficient code" is have the driver detect the programs and cap the card's processing power, therefore reducing power and current output. On the new architecture AMD has decided to do this in hardware, so the card detects the high current output and starts throttling itself, indifferent of the temperature of the chip, the old metric for throttleing.
Nevertheless, AMD is right, you probably won't be able to code something so efficient for practical uses(see bandwidth below), so they're doing the right call. The marketing team is just being "markety".

Also, it's not just overclockers who care about OCCT/Furmark, but also those who like the cards well built and properly cooled. Let's look at the picture:


Unlike power consumption, load temperatures are all over the place. All of the AMD cards approach 90C, while NVIDIA’s cards are between 92C for an old 8800GT, and a relatively chilly 75C for the GTX 260. As far as the 5870 is concerned, this is solid proof that the half-slot exhaust vent isn’t going to cause any issues with cooling.
No seriously, the half-slot exhaust isn't going to cause any issues because you will restrain yourself from using F@H or any stream computing application that properly uses close to those 2.72TFLOPS because:
  • The card will throttle itself so the cheap VRMs won't burn.
  • The card will throttle itself down to lower clocks because of the inevitable gathering of dust on the heatsink that has caused the chip to jump a measly 20ºC up (gross underestimate for a few months) and it's now running at over 100ºC.
  • We just wanted to trick you with that pretty number and now we will throttle you, whether you like it or not. Stay away from efficient software!
Had I not seem more than one ATI card with bad cooling(GPU not VRM) and the card throttling ifself down, I would also ignore the problem.
This is what I found when looking at a well cooled card from a decent manufacturer, running "unproper" software:
Temperatures: 46.5ºC idle, 55ºC UT2004, 74ºC running Furmark's stress test.
A lot lower, isn't it? I feel safer this way but maybe I'm just silly.

The "fix" for Furmark doesn't really hurt anyone as it still performs its function admirably: it helps tweak the cooling system to be able to handle every worst case scenario and to not let the card burn when they do show up. You know you're still pushing the VRM's and the GPU to the new (imposed) extreme so it's all good.


GPGPU, misleading marketing and bandwidth:

2.72TFLOPs/s is the number AMD is touting for this new GPU. It's a nice number and with the improvements I talked about yesterday, it could very well swing AMD to the GPGPU market.
The problem is that, as you can see, someone from AMD has been very opened about theoretical vs practical TFLOPs/s when they mention that "AMD believes that the instruction streams generated by OCCT and FurMark are entirely unrealistic" and that it may destroy your card. As such, you won't get near that mark.

From the benchmarks it's also easy to see that the card is bandwidth limited and the rumored 384-bit bus is a requirement to reach those kinds of TFLOPs/s in practical applications. The Radeon 4890 had some bandwidth to spare but the increased global data share cache isn't enough to make up for the doubling of processing power with just a moderate increase in bandwidth from 124GB/s in the 4890 to 153GB/s in the 5870. Coincidently, that 23% increase in bandwidth is more reflected in benchmarks than shader power, at around 40%, sometimes less than that. It's a bigger increase in performance than in bandwidth but is accountable with the increase in global data share and the relaxed amount of available bandwidth that blessed the Radeon HD 4890.
Drivers are still fresh and should improve performance a bit, I just wouldn't expect it to be too much.
The new Radeon must be compared to the 4890 and not the 4870, remember that. The new card has around the same clock, double the shaders and costs about twice as much. Does it deliver double the performance? No, it delivers 30-50% and added features.

Bandwidth, then, brings us too the issue of chip area efficiency. The cards are too big and have too much resources wasted on useless amounts of shaders. The die is now biger at 334mm2 vs 260mm2 and consequently the price is also up: $379 for the 5870 and $259 for the 5850, which features 1440 shaders and a lower clock at 725MHz. That's a considerable bump from $299 and $199 prices at which the 4800 series launched and will not make such a mess of Nvidia's lineup - if it's launched soon enough, that is.

The previous Radeon HD 4000 cards hit the sweetspots just right, both in price, chip cost and performance. While the 3870 from previous generations was an underpowered card, the 2.5x increase in shaders provided enough power to compete with Nvidia and the GT200 based cards. The RV740 was even better, just building itself upon that success of die size/compute to bandwidth ratio, which was more balanced in the HD 4850 than on the 4870.
Today, these new cards are at least as expensive to build as the old G92a, probably more if TSMC's 40nm process is still a problem, due to the clear shader power overshoot that happenned with "Evergreen" chips.


The rest of the family

The plans for the rest of the architecture, as laid out by AMD:


There's also the upcoming "Juniper" GPU, which Ryan mentions that AMD might release with 14 SIMD blocks, or 1120 shaders:
The “new” member of the Evergreen family is Juniper, a part born out of the fact that Cypress was too big. Juniper is the part that’s going to let AMD compete in the <$200 category that the 4850 was launched in. It’s going to be a cut-down version of Cypress, and we know from AMD’s simulation testing that it’s going to be a 14 SIMD part.
That's a very interesting number and if we see it coupled with GDDR5 and a bus of reasonable width, the card may be interesting for both AMD and us, the customers. The card also isn't that far away, I would expect it around December.
"Hemlock" is expected around the same time and it's the dual 5870. Not much is know right now, other that it might cost $649.

Conclusion

I repeat myself: If you want the a great gaming rig without looking at cost, go for two Radeon HD 5870 in CrossFire(it's a lot more interesting than Triple SLI for lots of reasons). If you are more sensible or don't have the money, wait for the Radeon 5850 or go for the cheaper Radeon HD 4890 ($169 after rebate) while giving up for DX11 support for a while. The 5850 should perform comparably - also with a better price/performance ratio - and Nvidia might also be worth a look in the latter case, as the GTX 275 performs about on par and starts at around $209.
If any of the new features interest you - and DX11 should - you really should wait for the Radeon HD 5850, which is expected next week. $129 more for a card that won't deliver 50% more performance isn't a very good deal.

Graphics Cards

ATI Radeon 5800 Detailed


AMD slides detailing the new "Evergreen" architecture.

The previous post about the new Radeon 5800 series already detailed that AMD might have glued two RV770 SIMD blocks together, this seems true although with some caveats detailed further below.


The most interesting tidbits here are CRC support and the ability to down clock the memory clock without screen flicker. The latter helps achieve the low idle power consumption of 27W that AMD promises for these cards.


Small L2 caches, 512KiB total(apparently, AMD had eight L2 caches in the RV770, so it may double). Remember that these cores have a huge number of registers which help make up for the smaller caches: the RV770 had 2.5MiB of register space.


GPGPU will go better in the RV870 core than it did in the RV770: these numbers are even more interesting than the GT200.
Although AMD only provides 160GB/s to the framebuffer, the new chip has a relatively big 64KiB Global Data Share memory that is unique to AMD's architecture and was already present in the RV770 chip but is now four times bigger. This might be one of the reasons AMD managed to keep bandwidth requirements to the framebuffer low, as this memory helps keep communication far from the framebuffer, something that Nvidia can't do as it only has the Local Data Share memory. Speaking of which, AMD now has double the amount of Local Data Shares compared to what Nvidia delivers in the GT200 chip per multiprocessor: 32KiB. This equals to a total of 640KiB of Local data share, while Nvidia only has 384KiB right now. It seems AMD was holding back before, as it has now delivered a very competent GPGPU chip just as OpenCL compilers start to tip up.
The new chip seems to have also gained IEEE754-2008 floating point standard compliance when calculating 32-bit vales(before only 64-bit was compliant), although I won't still guarantee you that. At least not until I get my hands on proper documentation of the architecture or a card to run some tests.


Market placement: "Cypress", or RV870, the Radeon 5800 series cards will range from $249 to $349 and "Juniper" will be delivered to the sub-$200 market.

Reviews for the new card are expected this week.

Graphics Cards

Nvidia GT 220 sighted


Gigabyte's OC version tips up, brings DirectX 10.1 support.

The new graphics card is base on the new GT216 chip from Nvidia:
  • 128 bit GDDR3 memory @ 1.6GHz (25GB/s)
  • 720MHz core clock
  • 48 shaders @ 1567MHz
  • Direct X 10.1 support
  • Cuda capability 2.0
  • Purevideo 3
The stock clocks of the GeForce GT 220 chip are 615/1333/1580 MHz for the core, shaders and memory, so this card delivers an improvement of around 15% improvement. I'm guessing the card isn't very bandwidth hungry or Gigabyte would've also overclocked the memory chips.
Performance should be comparable to the old G92 based 8800GS/9600GSO, albeit slightly lower due to the lower bandwidth(25 vs 33.6GB/s).

Source: Videokonsolu