June 17, 2021

## System Buses Explained

System Buses Explained

I don’t know about you, but I for one, despite thinking to myself that I had it all under my grasp always get confused when thinking about the array of buses in a modern computer. Bandwidth of the CPU, memory, AGP plus new technologies such as Hypertransport always leave me in a spin, especially when you’re talking to someone about it who manages to make you question your own knowledge.

I thought I would write it down for reference, and hopefully provide an understanding to those who want to know all about this topic.

Throughout this article we will try to grasp an understanding of all the components that make a viable computer system and hopefully see past the marketing which preys on our lack of understanding.

The Aim

Computers are not marketed these days from a purely technical point of view. All retailers or manufacturers will attempt to give their product an edge over very similar products in their class. Graphics cards and motherboards are an excellent example of this right now. Different names, same technology.

Marketing even goes so far as to deviate away from the correct technical terminology of computers. Kilo, Mega, Giga are not the same when it comes to making numbers “easy” for joe public.

Technically and correct:

1 bit is a single unit of information depicted in the form of a 1 or a 0.

There are 8 bits in a byte

There are 1024 bytes in a kilobyte

There are 1024 kilobytes in a Megabyte

There are 1024 Megabytes in a Gigabyte

There are 1024 Gigabytes in a Terabyte

1024*1024*1024 is awkward and provides results that are not nice for marketing.

Instead they move to multiples of 1000. 1000 bytes in a kilobyte, 1000 kilobytes in a megabyte and so forth. This provides nice round numbers.

Take this for example (we will cover the calculations later on):

Technically:

PC2100 DDR Memory / DDR266 Memory

64 (bits) * 266,000,000 (Hz) = 17024,000,000 bits/s

(17024,000,000/8) / (1024*1024) = 2029.4MB/s

Marketing:

PC2100 DDR Memory / DDR266 Memory

64 (bits) * 266,000,000 (Hz) = 17024,000,000 bits/s

(17024,000,000/8) / (1000*1000) = 2128MB/s

Convenient don’t you think? Not only does it provide a magical 100MB/s of bandwidth, it’s also a nice number (no decimal places etc..)

Latency

The problem with high multipliers in modern CPUs is the latencies involved. The processor clock speed (we will use 1.73GHz as an example) is far in advance of the relatively paltry speeds of the memory bus, AGP bus etc.. the CPU finds itself having to wait around for the rest of the system to catch up.

We shall use an example to illustrate:

A processor with a 133MHz bus speed running at 1.73GHz has a clock multiplier of 13 (13*133 = 1733).

# The CPU sends a request to the system memory for information
# The CPU then waits one cycle (commonly known as the command rate (1T)
# The memory undergoes what is known as a RAS/CAS latency
# The memory has a delay in finding the data known as a CAS latency

Thus whilst the CPU has waited 1 CPU cycle and then 4 bus cycles it has had to wait for 1 + (4 * multiplier) CPU cycles to get the data it was after. For every memory bus cycle the CPU has undergone 13 cycles. Not much when you consider this 1.73GHz CPU has 1.73 billion cycles per second, but how many times does the CPU access main memory? Quite a bit and so it all adds up.

Memory

We will consider 3 different types of computer memory in this article.

# SDR-SDRAM (Single Data Rate – Synchronous Dynamic Random Access Memory) – SDR-SDRAM was the dominant memory of the late 90s. Later version were available at speeds of 66/100/133 MHz as standard. This type of memory is/was used by both Intel and AMD for their recent offerings, even used in the i845/845G chipset with the Pentium 4 processsor. Later we will show what a mistake or distinct waste of CPU that was.

# DDR-SDRAM (Double Data Rate – Synchronous Dynamic Random Access Memory) – DDR-SDRAM has taken over where SDR memory left off. Particularly with AMD systems (Thunderbird / XP / Thoroughbred) DDR memory has come to the fore as the mainstream memory for the foreseeable future, with DDR-II on the horizon.

# RDRAM (RAMBUS Dynamic Random Access Memory) – Although only really made popular in the mainstream computer market via the Intel Pentium 4 processor, RDRAM technology dates back earlier than DDR memory.

Bandwidth Calculations

To avoid confusion later on here is a reference table for bits, bytes, Mega, kilo, Giga etc…

1 bit is a single unit of information depicted in the form of a 1 or a 0.

There are 8 bits in a byte

There are 1024 bytes in a kilobyte

There are 1024 kilobytes in a Megabyte

There are 1024 Megabytes in a Gigabyte

There are 1024 Gigabytes in a Terabyte

SDR-SDRAM

To calculate memory bandwidth we need to know 2 things. Its data width and its operating frequency. The latter is easier to find out as it is usually part of the marketing/retail title.

We usually see SDR memory at 100 or 133MHz. Taking 133MHz as the example, this means that the memory can perform an operation 133 million times every second.

Finding the data width, well that’s just something you have to look up. SDR memory has a data width of 64 bits or 8 bytes (8 bits in a byte).

PC100 SDR Memory

The calculation is as follows : data width * operating frequency = bandwidth (in bits/s)

To convert to more realistic and manageable figures, divide the result by 8 to give bytes/s and then divide again by 1024 to get kilobytes/s and then by 1024 again to get Megabytes/s.

Thus : 64 (bits) * 100,000,000 (Hz) = 6400,000,000 bits/s

(6400,000,000/8) / (1024*1024) = 762.9MB/s memory bandwidth.

PC133 SDR Memory

Using the same forumla as we did for PC100 SDR memory we can easily calculate theoretical memory bandwidth for PC133 SDR memory.

64 (bits) * 133,000,000 (Hz) = 8512,000,000 bits/s

(8512,000,000/8) / (1024*1024) = 1014.7MB/s or roughly about 1GB/s memory bandwidth.

DDR-SDRAM

DDR memory is slightly more complicated to understand for 2 reasons. Firstly, DDR memory has the ability to transfer data on the rising and falling edge of a clock cycle, meaning theoretically DDR memory doubles the memory bandwidth of a system able to use it.

Secondly, as a marketing push to compete with a rival technology at the time DDR was introduced, RAMBUS; DDR was sold as a measure of its approximate peak theoretical bandwidth. Similar to AMD and the PR rating of the XP processors we have today, People buy numbers, and DDR was seen to be faster if it was sold as PC1600 and PC2100 instead of PC200 and PC266.

PC1600 DDR Memory / DDR200 Memory

DDR memory has the same data width as SDR memory: 64 bits.

We use the same calculation to measure bandwidth, with the additional frequency.

64 (bits) * 200,000,000 (Hz) = 12800,000,000 bits/s

(12800,000,000/8) / (1024*1024) = 1525.9MB/s.

Notice the bandwidth is twice that of PC100 SDR memory.

PC2100 DDR Memory / DDR266 Memory

64 (bits) * 266,000,000 (Hz) = 17024,000,000 bits/s

(17024,000,000/8) / (1024*1024) = 2029.4MB/s or roughly 2GB/s memory bandwidth.

With the advent of improved memory yields, modules able to run at higher clock speeds are being released to the market. PC2700 has finally come into its own with the introduction of the AMDXP2700+/2800+ and the Intel i845PE chipset.

Here are some bandwidths for the latest memory available:

PC2700 DDR Memory / DDR333 Memory

64 (bits) * 333,000,000 (Hz) = 21312,000,000 bits/s

(21312,000,000/8) / (1024*1024) = 2540.6MB/s.

PC3200 DDR Memory / DDR400 Memory

64 (bits) * 400,000,000 (Hz) = 25600,000,000 bits/s

(25600,000,000/8) / (1024*1024) = 3051.8MB/s.

PC3500 DDR Memory / DDR434 Memory

64 (bits) * 434,000,000 (Hz) = 27776,000,000 bits/s

(27776,000,000/8) / (1024*1024) = 3311.2MB/s.

RDRAM

RDRAM memory is slightly more complicated in that the bus operates at an effective 64 bit bus width ala DDR but is separated into 2 16/32 bit channels. What does this mean? well currently 2 sticks of RDRAM have to be used in a system. DDR has the advantage (usually from a cost point of view) of being able to be used in single DIMMs.

The caclulation is basically the same however, we just need to take into account the extra channel and additional memory speed.

PC800

16 (bits) * 800,000,000 (Hz) = 12800,000,000 bits/s

(12800,000,000/8) / (1024*1024) = 1525.9MB/s. Multiplied by 2 because of the dual channel configuration – 3051.8MB/s

PC1066

16 (bits) * 1066,000,000 (Hz) = 17056,000,000 bits/s

(17056,000,000/8) / (1024*1024) = 2033.2MB/s. Multiplied by 2 because of the dual channel configuration – 4066.4MB/s

nForce

nForce is special as it heralded the future of memory interfaces, for DDR at least. Dual DDR technology gives 2 64bit channels instead of 1 making an effective 128bit memory bus. This allows twice the bandwidth through the bus.

Although DualDDR technology never really made a huge impact on nForce memory bandwidth (so the benchmarks tell us at least), it has great potential to a recent DDR convert.

The Intel Pentium 4 processor, a long standing advocate of RAMBUS/RDRAM has pledged to move away from the serial memory technology and embrace DDR. Unfortunately, as the memory bandwidth calculations on page 4 showed, DDR in its current form has neither the bandwidth or the potential to scale up to RDRAM bandwidths in its current iteration.

Dual DDR will make a big difference to Pentium 4 chipsets. P4s with QDR architecture can achieve bandwidths of around 4GB/s, perfectly matched with PC1066 RDRAM. The fastest DDR memory currently available on the other hand, PC3500 has a bandwidth of around 3.1GB/s. The P4 is crippled with current DDR chipsets.

Doubling the memory bandwidth then is something Intel is looking forward to.

PCI Bus

The PCI bus is one of the older buses in a modern system. It is the bus which connects all the expansion cards in a system to the main chipset, along with IDE and USB.

The PCI bus is a 32-bit wide bus running at 33MHz. Using our familiar calculation we can now easily calculate its maximum bandwidth.

32 (bits) * 33,000,000 (Hz) = 1056,000,000 bits/s

(1056,000,000/8) / (1024*1024) = 125.9MB/s. Rounded up to 133MB/s

It is relatively easy to imagine, that with modern ATA133 Hard Drives, PCI network adapters, sound cards and the like, the PCI bus can easily become saturated. There are 3 ways around this solution. 2 have already been implemented.

# Expand the bandwidth of the bus – Server motherboards, especially with the prevalence of SCSI hard drives requiring more bandwidth than the PCI bus can transfer, have moved to a 66MHz bus using 64bit slots. This quadruples the bandwidth afforded.

64 (bits) * 66,000,000 (Hz) = 4224,000,000 bits/s

(4224,000,000/8) / (1024*1024) = 503.5MB/s. Rounded up to 533MB/s

# Move to a dedicated bus – The obvious example here is graphics cards. With ever increasing speeds of graphics cards needed to deal with ever complex games the PCI bus of old simply cannot deal with the sheer amount of information needed to get to the northbridge and vice versa. Thus the AGP bus was born. A direct link from the AGP card to the chipset running at 66MHz with a 32bit bus gives a maximum bandwidth of:

32 (bits) * 66,000,000 (Hz) = 2112,000,000 bits/s

(1056,000,000/8) / (1024*1024) = 251.77MB/s; rounded up to 266MB/s

IDE

IDE hard drives transmit data to the CPU and vice versa, via the PCI Bus. Of course this means that any transfers is limited by the speed of the PCI bus, 133MB/s or thereabouts meaning ATA133 is as high as IDE can get (even though in reality it never gets close anyway).

Recent innovations have tried to bypass the PCI bus for IDE transfers. VIA’s VLink technology is a dedicated bus running at 266MB/s between the Southbridge and Northbridge.

Serial ATA

The successor to IDE. Why is this in the PCI section? Well currently despite all the hype, Serial ATA connectors all use the PCI bus to transfer information. SATA150 with a theoretical maximum transfer of 150MB/s is limited to the paltry 133MB/s of the PCI bus. Future chipsets will alleviate Serial ATA of the PCI bus burden and allow direct access to the chipset probably on a dedicated bus. This is needed for the next generation of SATA devices able to run at 300/600MB/s.

AGP Bus

As partly explained on page 6, the AGP bus was born to accommodate the ever expanding bandwidth needs of graphics card. The 133MB/s capacity of the PCI bus simply wasn’t able to handle the likes of cards faster than the Voodoo 3, one of the last PCI graphics cards.

The AGP bus was a 32bit bus like the PCI bus, but it operated at 66MHz giving it a maximum bandwidth of 266MB/s. This was and is known as AGP 1x.

Similar to the QDR implementation of the Intel Pentium 4 processor, the AGP bus was redesigned to allow data to be processed 2, then 4 times every clock cycle. This is known as AGP2x/4x. More recently AGP8x has been introduced.

Each iteration of AGP has doubled the bandwidth of the previous standard:

# AGP1x = 266MB/s

# AGP2x = 533MB/s

# AGP4x = 1066MB/s

# AGP8x = 2132MB/s

Hypertransport

In all walks of life, things move on. Standards described 10 years ago ad beyond can never hope to achieve scaleability to today’s needs.

As the 8bit ISA bus was superceded by the PCI bus, thus the outdated PCI needs to be phased out and a new interconnect protocol defined. The leading contender for the throne at the moment is Hypertransport.

An AMD led consortium hopes to make Hypertransport the defining interconnect protocol of the foreseeable future.

What Is Hypertransport?

Hypertransport is a point-to-point interconnect primarilly designed for speed, scaleability and the unification of the various system buses we have today. The same link can be used to retrieve data from a network card and a bank of DDR memory.

Here is an example of the typical computer bus layout as we know today:

Hypertransport would eliminate most of the bottlenecks found in today’s systems. The PCI bus as explained earlier is easily saturated with the high bandwidth peripherals in use.

In terms of speed, Hypertransport is capable (at the moment) of delivering throughputs of up to 51.2Gbps.

Using 500MHz clock rate as an example

2 (bits * 500,000,000 (Hz) = 1000,000,000 bit/s

(1000,000,000/8) / (1024*1024) = 119.2MB/s – with the ability of DDR signaling this is doubled to 238.4MB/s.

or to use Gbits (basically because it sounds more):

1000,000,000 / (1024*1024*1024) = 0.93Gbps (rounded up to 1Gbps). With the DDR signaling this is shunted up to 2Gbps.

We see Hypertransport in today’s technology through one company’s innovation to break from the norm. NVIDIA’s nForce (and nForce2 of course) use Hypertransport as the primary interconnect offering throughputs of 800MB/s (nForce1) and 1600MB/s (nForce2). Not top speed Hypertransport but more than enough for today’s components.

VIA have validated Hypertransport for use in their upcoming K8 AMD Hammer chipsets so the future is certainly picking up for the fledgling protocol.

Roundup

Before we talk about what will come let us briefly cover what is going on at the moment.

It should have hopefully become apparent that there are many pitfalls when deciding on a new computer system, for both home users and businesses alike. As always, technical details are buried under a big pile of marketing. Minor advancements in technology that in reality, do nothing are heralded as the “next big thing”. A quick look under the surface however, shows this not to be the case.

It pains me to see users asking whether they should upgrade their VIA KT266a based motherboard to a VIA KT333 chipset because “it must be faster”, bigger numbers mean faster right?. Wrong, a balanced system means you can squeeze the most out of your setup, be it for gaming, CAD or other intensive operations. Nobody wants to spend money needlessly so read this article again, get a feel for the numbers involved and come to your own conclusions.

The Future

We covered briefly the aspects regarding future IO buses. Hypertransport and PCI-Express are on the horizion, or indeed are already here. We need the peripherals and components to make use of this additional bandwidth. At the moment it seems wherever you look, there is a bottleneck.

Hopefully in the future manufacturers will settle on fewer buses, it’s less confusing for the consumer and it also means that computers will become less complex. Take for example USB2.0 and Firewire (not covered in this article), two competing protocols that basically do the same thing. Hot-pluggable, scalable, high-bandwidth connections. Why not settle on one and stick with it?

Anyway, end of the ranting. We hope you enjoyed this article. It will be constantly updated as new technologies emerge in this ever-changing industry.

At the end of the day, this is a reference for us all.