Graphics Cards

CUDA, AMD Stream SDK And Unexpected Transcoding Results


Both AMD and Nvidia have promised that GPUs would accelerate everyday computing tasks. It's an effort to sway people from using processors more than anything, a market where Intel is dominant and where AMD will profit either way.
Most of the consumer grade software that has become available has failed providing either proper performance or quality when compared with software that runs on the CPU. One such case is that of the video transcoding applications.

Xbitlabs recently compared AMD's most recent integrated chipset - the 785G - to Intel's best: the G45. This is what they found when using transcoding:


The performance benefit from using ATI Stream is close to 40%, which is a very good result when considering that the AMD machine is already running on a Phenom II X2 550 that runs at 3.1GHz.
The IGP doesn't have more bandwidth than the CPU to begin with(it actually has less) so it has to rely on pure processing power of the 40 shader processors that are embedded inside. The 40 shaders are really only eight 5-way SIMD processors, similar to the cores on the Phenom II but without advanced branch prediction, some other more complex stuff and a lower clock but the same SIMD capabilities. If one were to classify the Phenom II like a shader processor, it would be roughly equivalent to 8 SPs(two cores capable of 128bit (32*4) floating point operations per clock).
As for processing power, the PII X2 550 delivers around 24 GFLOPs/s - 3.1*2*128/32 - of peak throughput and the 785G is capable of 40 GFLOPs/s - 500MHz*40*2 - when you factor in the FMADD capability of the shader cores that can perform a multiply and an add in the same clock. That equals to 67% more processing power than the Phenom II, which is roughly what real figures translate to - 40% isn't a bad result at all.
The reviewer mentions that the CPU utilization is close to zero during transcoding with ATI Stream, so it's obvious that there's no cooperation between CPU and GPU. The GPU is not helping, it's doing all the work.

This kind of performance gain is great if one wants to keep power consumption low or only has a slow cpu, like in sub-laptop machines. AMD only delivers 690G based sub-laptops but has announced Stream capable 780G variants to replace the likes of the HP DV2 and Gateway's Athlon Neo powered sibling.

The big problem with video transcoding on the GPU are the unexpected quality issues. The video transcoded with the CPU has always a different quality, always better. There are artifacts or other types of problems that pop up when using transcoding software for the GPU. Why is this?

One of the main problems while performing calculations on GPUs is that you must use single precision, 32bits, to achieve decent performance gains compared to the CPU. There's a problem in GPUs tough: they aren't compliant with the IEEE754 standard for floating point arithmetic.
Usually results don't differ much to be significant but sometimes the results can be quite bad for a given application. The problem may come from the fact that some rounding modes aren't supported by either Nvidia's or ATI's GPUs or that something is also happenning internally in regards to the number of bits to store intermediate results. From my experience, the lowest and highest numble that can be stored in 32bit floating point format tends to be very close to the number that the CPU produces but may be significant in some cases. Neither AMD or Nvidia fully document what happens inside the hardware, so it's kind of a guessing game about what goes inside than anything else.

On the other hand, starting with the Radeon HD 3870(RV670) and Nvidia's GeForce GTX(GT200), both companies support an IEEE 754 compliant 64bit floating point arithmetic. The problem is that the 32bit implementation hasn't been fixed yet and is a hurdle in porting some codes. Using 64bit double precision floating point is possible in some algorithms but the performance drop makes it impossible to efficiently use it in these kinds of applications. While CPUs experience a drop in performance to half of what they can do in 32bits, the GT200 chip drops to one tenth of the performance and ATI to approximatly one fifth. Nvidia uses one 64bit ALU per shader processor but AMD can use the 128 bit SIMD cores for the job, processing two doubles per clock - the transcendental ALU can't do doubles and the SIMD core can not do FP MADD though.
Since ATI has less but wider shader processors, the RV670+ chips see a smaller drop in performance than Nvidia's cards. It's too big of a drop either way, so there's not much hope with doubles, at least for video transcoding. There are some scientific algorithms that use them, although in something like 1% or less of all operations performed, hence not hurting performance considerably.

The software itself is also to blame not just the cards: Nvidia's software usually can produce better results than equivalent AMD Stream capable transcoding software, altough single precision hardware implementations aren't considered different for both vendors - they complied with what DirectX and OpenGL standards demand for shader processing.

In the end, while both companies don't fix the lack of IEEE754 compliance in single precision, there will exist some applications that will always produce bad or strange results. AMD seems to have glued two RV770 chips together with the Radeon HD 5800 series, so it will be hard to find the proper silicon there. Nvidia, being more committed to the GPGPU initiatives - given the lack of x86 license and impending chipset and Larrabee "apocalypse" - will be pushing harder on that front and we may see it fixed on the GT300 - I wouldn't be on it tough.

Processors

AMD Debuts Athlon II X4 620/630 ''Propus''


Yesterday marks the release of the $99 quad-core, the Athlon II X4. AMD delivers the new "Propus" core, a cut-down Phenom II X4 without L3 cache.

The new CPU is very capable and doesn't even compromise much to the Phenom II X4 in some situations, like rendering applications. What is more interesting perhaps, is the CPU's ability to outperform the old "Agena" core without posessing any L3 cache:


Although this isn't the norm, the architectural improvements AMD made when transitioning to the 45nm process have helped the new Athlon II X4 to perform admirably.

The lower price of $99 makes the Athlon II X4 620 a better processor than what the Core 2 Quad 8200 is right now. In some situations, the newcomer is even more interesting than the Phenom II X3 720 - priced above it - but gaming does suffer from the lack of a big L3 cache - which is mostly the biggest downside, besides the locked multiplier.
Overclocking does turn the table considerably in Intel's favor.

AMD is expected to start shipping only L3-less "Propus" based Athlon II X4 processors but you may grab yourself an unlockable core if you act fast, based on "Deneb" cores that power the Phenom II. Earlier steppings will have a better chance at that.

Overall, AMD delivered an excellent processor with the Athlon II X4 620. The Athlon II X4 630, priced at $122, isn't as interesting, given the only slight increase in clockspeed from 2.6 to 2.8GHz - the price difference just doesn't justify the premium for a well informed buyer.

Processors

Core i7 860 Benchmarked


PCPerspective benchmarks the new midrange champion.

Now that someone has actually benchmarked the thing, it is more than safe to go out and buy one. My verdict is still the same, as all estimates about performance turned out true:
  • Great CPU for gaming
  • Cheaper than going with the LGA 1366 platform
  • Not as good as the i7 860 for SLI/CrossFire or bandwidth intensive applications.
The Xeon 3440 is still the cheapest and most decent alternative to the Core i7 860 without giving up HyperThreading and it's already available for sale.

Motherboards

Fixing An ''Hardware'' Incompatibility


Ever plugged some piece of shiny new hardware to your, still very capable, two/three year old PC just to find out they don't work together? I did too many times, so many that I eventually found how to fix most of these problems.

Meet the candidate, an AMD Athlon X2 3800+, 2GB of DDR2 800 and the Asus M2NPV-MX :


It is still a very capable machine, a dual core and enough RAM for most games. The graphics card was killing its style, as it was an aging GeForce 7600GS.

The new graphics card will be the excellent XFX Radeon HD 4830 that I've bought for myself a few months ago. The card is great but Linux drivers are terrible, almost unworkable for everyday work/leisure. I got an Nvidia card for me and since Windows drivers are fine, it's going to get a proper use in this PC.

I plugged the new and shinny Radeon and, to my dismay, it makes the PC unbootable. Boots up, gives a beep and then it just sits there. No problem, let's look at the problem in detail:

The M2NPV-MX is based on the old nForce 6150+430 which supports PCI-e 1.1 based, the HD 4830 is PCIe 2.0. But, my personal PC uses an nForce 550 motherboard and an Nvidia PCIe 2.0 card that never gave me problems and the standard is backwards compatible after all. This problem is the so called hardware incompatibility. They do happen from time to time, sometimes they aren't fixable even, but having been building computers for fun in the last 10 years, I've learned valuable things: since PCIe 2.0 is backwards compatible, one must first assume that everything is fine with the hardware and move on to the software, in this case the BIOS.

I checked ASUS' website for a BIOS update that would've fixed incompatibilities with ATI/AMD or all PCI-Express 2.o cards and I found none. The other thing I've learned over time is that usually hardware manufacturers will fix things with BIOS updates and not tell you. It's usually as cumbersome to fix the problem as it is to properly document it. Other times, since most, if not all, use a BIOS from AMI or AWARD, most of the time the manufacturer ends up updating shared parts of the BIOS that came from the said vendor and will incorporate fixes for problems they don't even dream that the hardware has. Someone probably had a problem with PCIe 2.0 cards, it got fixed for another manufacturer and upstream the code went.
The BIOS on the M2NPV-MX was from 2006, I updated with one from late 2007 and crossed fingers as I plugged the card. Bingo, it worked. It just booted fine. "Hardware" incompatibility gone.
I've ran a few tests and everything is working great. Had to get a 90ยบ SATA cable to connect the HDD and will give up two SATA ports there. The card just sits too close, but other than that everything is working fine:



The X-Fi still leaves enough clearance for the card's fan to work perfectly, despite the overall lack of space of the components. Hot air goes out of the case directly, so it doesn't pose any thermal problem in this build.

Upgrades usually come with strings attached. Checking that the BIOS isn't the issue is relatively painless since you can now flash it from the BIOS itself or even from Windows. It may save you some time and money if you end up lucky as I did this time.

Motherboards

ASUS Releases P7F-E Workstation Motherboard


ASUS launches LGA 1156 based Xeon motherboards.

Earlier I posted about how Intel has released an interesting variety of "Lynnfield" based Xeon processors, namely the Xeon 3440, which only costs slightly more than the Core i5 750 but comes with HyperThreading support albeit at a slightly lower 2.53GHz clock. While availability of these new Xeons is still scarce, the launch of motherboards will soon change that fact.

Specifications are the following:

Model P7F-E
Processor Next Generation 45nm Quad Core UP Xeon CPU
Next Generation 32nm Dual Core UP Xeon CPU
Memory 6 x DDR3 DIMM Slots
Storage Interface 6-port SATA2:
Optional ASUS PIKE:
PIKE 1064E/1068E (RAID 0, 1, 1E)
PIKE 1078 (HW RAID 5, 6, 50, 60)
PIKE 6480 (RAID 0, 1, 10, 5)
RAID Support 6 SATA2 300MB/s ports:
Intel Matrix Storage (for Windows only)
(Supports software RAID 0, 1, 10 & 5)
LSI® MegaRAID (for Linux/Windows)
(Supports software RAID 0, 1, 10)
NIC Intel® 82574L GbE LAN x 2 + Management LAN x 1
Onboard Graphics ASPEED AST2050 8MB
Slot 7 MIO Slot for audio card**
Slot 6 PCI-E x16 Slot (Gen2 x16 Link)
(Auto switch to x8 Link if slot 3 is occupied )
Slot 5 PCI Slot
Slot 4 PCI-E x1 Slot (x1 Link)
Slot 3 PCI-E x16 Slot (Gen2 x8 Link)
Slot 2 PCI Slot
Slot 1 PCI Slot
Form Factor 12" x 9.6", ATX
** PCI-E x1 is not supported.


The addition of two extra slots is good for workstation uses but will probably hurt the clockspeed that they can run at when all six are plugged - the memory controller may not take it very well and will probably throttle down to 800 or 1066 and/or relax latencies.
There's also a very slow 2D graphics chip that will be enough for server duties and an extra 10/100 Ethernet port that serves as the management interface.
Pricing for the motherboard wasn't disclosed by Asus.

Graphics Cards

Radeon HD 5850 & 5870 Pictures And Benchmarks


Technology details, six monitors support et al.

AMD has presented it's graphics cards to the press in the USS Hornet aircraft carrier last Friday. Some details have surfaced, including those of a Radeon 5800 card that supports output to six monitors, aptly named Radeon HD 5870 Six:

Picture courtesy of PC Perspective

This new multi display technology is named EyeFinity and is implemented just right. The technology allows you to configure the six monitors to appear to Windows as just one with the aggregate resolution, allowing for easier support in games - or in most of them.



Don't be fooled though, if you think you can get away with doing that with cheap monitors. You need monitors with proper viewing angles - based on IPS, PVA or MVA panels - and not your standard $150 monitor. Otherwise, you will definitely not enjoy the experience.

What AMD didn't presented though, were specs and benchmarks, which have been leaked by other means. Shader count for the new GPU was already rumored before to be 1600, or double of what exists in the RV770, Radeon 4800 series chips, and that has now been confirmed by several websites. This slide is from ATI-Forum:


The card itself seems to lack bandwidth compared with the massive theoretical peak performance of 2.72TFLOPs/s, as it is only of 153.6GB/s, less than what Nvidia has in the GTX 285(159GB/s). That doesn't stop it from being fast, very fast, especially on some benchmarks:


The Radeon HD 5870 is faster by 55%, on average, although it does show some lower results here and there. Less bandwidth bound games perform remarkably well, driven by the enormous amount of processing power. The card is consistently faster or similar in performance to the dual GPU based HD 4870X2. The card seems to be seriously lacking in bandwidth, despite the good performance. Expect versions with overclocked memory to show up soon enough.

The 5870 also had it's TMUs and ROPs doubled to 80 and 32 respectively, when compared with the Radeon 4870 and the clock of the GPU is 850MHz with the GDDR5 memory at 4.8GHz. Its price is expected to be $399, getting lower progressively, similar to what happened with the HD 4870/4850. Hence the "< $400", as shown in the slide above, instead of a fix price point. The Radeon HD 5850 will be launched at less than $300 and will offer similar performance to the GeForce GTX 285. Its clock is lower at 725MHz and a shader cluster is disabled, yielding 1440 shaders and 72TMUs with the memory clocked at only 4GHz(GDDR5 also).
Equal in both GPUs is the idle power consumption, terrific at just 27W, as you've probably noticed in the slide from AMD.

Chiphell, notable for it's earlier leaks, has a diagram of the core, although it's still unconfirmed if this is really how the core is designed:


The diagram shows two blocks of 800 SPs, which would mean AMD could have just stuck two RV770 SIMD blocks together and added some featrues. The numbers do add up, but this is still pending confirmation from AMD. The chip, as we can see from the pictures, measures around 350mm2, so it's not very expensive to produce and sits in the territory of the G92 chip, which spawned many cards at $200-$250. AMD can introduce a cheaper card with less shaders after the 5850 and 5870 without loosing money.

The cards are expected to hit retail in September 23rd. Nvidia has not showed any cards and seems to be quite late with the upcoming GT300. Right now, everything is looking up for AMD, the card looks great, it's the first to support DX 11 and even the Radeon HD 5850 will be a formidable opponent to the GTX 285 while being cheaper by $50 on average.