A $662.00 "Supercomputer"...?


The GPUMeister...one CPU, five GPUs, 12 fans...uses 'em all!

[CUBBY HOLE]
(Click on thumbnails to view larger images.)



NOTE: My most recent pages are back over at www.stanford.edu/~hydrobay again.



[Rainforest Site]

Click here for my boundary layer turbine page.
Click here for my poly-phase motor/generator page.
Click here for my single-phase motor/generator page.
Click here for my solar powered fluid mechanics lab page.
Click here for my 3-axis CNC machine page.
[best use any browser] [under construction]
Home

Contents:

The Real Story:


Most people are surprised to find, given what they see on my webpages, that I'm actually a physical oceanographer. But, that means lots of data collection and lots of calculations. Those things lead to electronics and computers, and, the rest, as they say, is history. Another way to state it is virtually all the weird things I do always have some connection to marine or environmental science, or both. In fact, while I am employed by Stanford University, performing large amounts of computer system administration, hardware development, and data analysis and database programming, I actually do most of it at Hopkins Marine Station, in Pacific Grove, California; working for the TOPP (Tagging of Pacific Pelagics) program.

This project, the GPUMeister, is as well connected to marine and environmental science. Yes, it is part of my ongoing, interrelated set of projects which center around fiddling with model Tesla Turbines and permanent magnet motor/generators. But, those, too, have the same kind of connections to marine science, (well, fluid mechanics, anyways), and environmental science, (well, at least a modest potential for some alternative energy applications). My excuse for building the GPUMeister is I need, (well, want), a machine well equipped for turbine flow modeling, without spending a small fortune to get one. (...OK, so it has some potential for becoming a decent games machine...)

As more and more people are finding out these days, the GPU (Graphics Processing Unit) found on modern video cards is well suited to many kinds of scientific computations, particularly those encountered in physics and fluid mechanics. With a little shopping around relatively inexpensive, but reasonably powerful GPU cards can be found. Plugging in several of them at once can make for a fairly powerful parallel processing system with minimum expense. And, hence, the GPUMeister project was born.

Parts is Parts:


Basic Stuff:

The biggest criterion for building a system such as the GPUMeister is finding a decent, affordable motherboard with enough slots to plug in a reasonable number of GPU cards. In keeping with the keeping it cheap theme, I elected to use standard PCI slot cards, rather than AGP or the newer PCI-e bus cards. At the time I started this project, (mid 2006), standard PCI slot video cards were becoming a little harder to find than their AGP and PCI-e counterparts. None-the-less, standard PCI cards tend to be less expensive than AGP and PCI-e cards for the same on card GPU and memory. That is also true of motherboards with multiple standard PCI slots; a little harder to find, but tending to be available among the lower cost, while still capable units.

One can argue that a faster parallel system could be produced with the GPUs running on an AGP or PCI-e bus. Sure! But, in fact, for many kinds of computations, including the types we'll be discussing here, interprocessor communications is not the biggest bottleneck for computations. So, I'm going cheap with PCI! You can do what you want...;^)

There are a number of vendor options for low-cost, multiple PCI slot motherboards. But, of course, to pick a motherboard, you first need to pick a CPU vendor, which, for the time being, pretty much means either Intel or AMD. Now, a very significant issue for a parallel system such as the GPUMeister is power consumption. (Particularly significant if you happen to live in the same room as your up and coming personal supercomputer.) Since the system CPU is not going to do the bulk of the work, it doesn't have to be a fire breathing monster. Also, you probably truly don't need more than one. Adhering to those considerations will help reduce power consumption. But, reasonable CPU capability will still be a plus. For this project, selection of the best of all worlds CPU, considering cost, capability, power consumption, and being able to plug into an adequately slotted motherboard came down to a boxed (i.e., with fan preattached), socket 754, AMD Sempron-64 2600+.

With the CPU decision made, final motherboard selection came down to an ABIT model NF8-V, which has one 8x AGP video slot, and 5 standard PCI slots. This board is inexpensive, but capable, even receiving a few small nods from the overclocker crowd. Overclocking isn't an issue for this project, but, overclocking capability is an indication of stability. Stability is good.

Besides a motherboard and CPU, the usual suspects needed to be rounded up, including: case, memory, hard drive, and CD/DVD burner. Of those, the case is perhaps the most critical. You need something with enough power available to run multiple graphics cards, and with cooling sufficient to keep everything from frying. I ended up with a Scorpio RAIDMAX&trade gamer style case. It looks cool. But, that isn't why I chose it. I chose it because it is cool. Besides the two fans in its 480 W power supply, it has two fans front, 2 rear, and one in the transparent side panel. Add to those the CPU fan, and that's eight to keep things heat happy. And, it's still whisper quiet. (The PCI slot graphics cards have on board fans, bringing the final system configuration fan total to twelve. The card fans are a tad more noisy than the case and CPU fans, but the system is still very quiet.) For system memory, the GPUMeister has one 1 GB stick of DDR/266 RAM. It's hard drive is a 80 GB, 7200 rpm Maxtor (6Y080LO). An Emprex 16x dual-layer DVD burner rounds out the basic package. In all, with or without addition GPUs, a relatively inexpensive, but still respectable system.

[PARTS IS PARTS]


Tying it All Together:

For most intents, though not actual purpose, the GPUMeister is just a multihead video system. As such, each GPU card, whether the system video AGP card, or one of the PCI slot cards, can be used for video display. In fact, programming them individually for video display before moving on to computation is a good way to test everything works.

In addition to the usual system AGP video card, the GPUMeister has four PCI slot video cards. Clearly, installing five monitors for test purposes would pretty much defeat the keeping it cheap theme (not to mention take up far more than the available real estate). However, needing to crawl behind the desk to swap video cables for every test wasn't exactly appealing, either. To solve that problem I purchased two inexpensive HD-15 female jack mechanical switch boxes, and 6 six-foot, male-to-male, HD-15 cables. One box has two positions (A/B), and the other has 4 positions (A/B/C/D).

The two-position switch box common jack is cabled to the system monitor, with its A jack cabled to the AGP card video out, and its B jack is cabled to the common jack of the four-position switch box. Thus, the A setting of the 2-position box sends video output from the AGP card to the monitor, while its B position directs video output from the 4-position box to the monitor. The A,B,C,D jacks of the four-position switch box are connected to the 4 PCI slot video cards, from top to bottom in the case, respectively. Thus, with the two-position box at its B setting, the 4-position box will direct video from the selected PCI slot video card to the monitor.

I did look for an HD-15 switch box with more than 4 connectors, so that the setup would only require one box. But, mechanical types with more than four positions don't seem to be readily available, and the available electronic switches are far too expensive for this project.

Yes. I do switch video connections with the system powered up. Both switch boxes, (like most all mechanical data switches), are break-before-make types. So, they won't short anything out, regardless of being powered up or not. You can worry about blowing something up if you want to. I won't...;^)

A piece of double-sided foam picture-mounting tape on the feet of the switch boxes keeps the cables from pulling them off the top of the case. The item you can see cabled in below the PCI slot video cards in some of the photos is a small USB 2.0 hub.

[PARTS IS PARTS] [VIDEO STACK] [CABLING] [CABLING] [SET UP]


The Heart of the Matter:

The heart of the matter? Why, the GPU, of course! At this time the major GPU chip providers are ATI, and nVIDIA. Both companies produce excellent processors. I went with nVIDIA for the GPUMeister because of their Linux support.

ATI does seem to be coming around, but, at this time, they are far behind nVIDIA in Linux support. Plug in any model of nVIDIA graphics card into a Linux system, power up, install their "unidriver" and it works. Power down, swap in another model nVIDIA graphics card, power up, and it still works. Sweet! (You didn't actually wonder if I run Linux, did you?...;^)...)

Initially I set up the GPUMeister with model MX4000 graphics cards. Later I switched to FX5200 version cards. Some of the reasoning for the particular models of GPU card I selected may be a bit hard to follow before we discuss how a GPU can be programmed for computation. But, briefly, the MX4000 unit, by date of production, is a fourth generation system (near the latest there is at this writing), but, functionally, it is more of a third generation product. Specifically, it has good shader unit access, but very limited fragment processor access (we'll talk about that in more detail later). For purposes of the type of computations we'll be looking at, (the Lattice-Boltzmann technique), that really isn't a big problem, because the methods map well onto the shader unit.

Of course, not being a problem is not necessarily as much fun as it could be. With good access to the fragment processor, you can, using texturing techniques, make pretty pictures that are much more difficult to produce otherwise. And, after all, if you're fiddling with graphics cards, why wouldn't you want to make pretty pictures?

The FX5200 is probably the lowest-end unit that has reasonable fragment processor access. I might have gone with a bit higher-end GPU, but, I was digging through my boxes of junk searching for something else, (an air-amplifier to use in a new nozzle for my boundary-layer turbine), and came across an FX5200 based AGP card. I guess it was just fate. (And, I did find the air-amplifier, too. Check the turbine page, it may have shown up there by now.)

Cg is the GPU programming language we will be using. The FX5200 based boards are capable of running all the example programs the come with the nVIDIA Cg Toolkit without modification. That makes the FX5200 more than suitable for our purposes. The MX4000 series cards can be made to run most of the examples with some modifications related to accessing the fragment processor. The 1.5.Beta 2 version of the Cg Toolkit was installed on the GPUMeister. We'll discuss Cg programming in a later section.

[FX5200] [EMPTY SLOTS] [FULL SLOTS]


The Bottom Line:

So how much was it?

GPUMeister Component Costs Rounded to the Nearest Dollar
Gamer Style Tower Case$69.00
Boxed 64-Bit CPU$65.00
PCI Slot Motherboard$47.00
1 GB DDR Memory$78.00
80 GB UDMA Hard Drive$70.00
DL DVD Burner$40.00
AGP Video Card$60.00
PCI Video Card (x4)$200.00
2 Position Data Switch$5.00
4 Postiion Data Switch$10.00
HD-15 Cable (x6)$18.00
TOTAL:$662.00


OK. The GPUMeister system isn't, and never will be, the most flashy "supercomputer" in town. But, it was just $662.00 to build. And, you must admit, it looks really spiffy in the dark!

[OOOHHHH]


System Setup:

Nothing to See Here:

Even in you have an already working, single video card system, if you plug a bunch more video cards into it, don't be suprised if you don't see anything on you monitor anymore when you reboot. This may happen though your system seems to otherwise boot fine, and you even see the initial startup spew on your normal video output.

With one video card, your system can figure out where it is and how to use it. With more than one card, that ain't necessarily so. You may need to change a BIOS setting to tell it which card to use for the default video monitor output.

If you have a missing video problem, then reset the system and go to your BIOS setup. Somewhere in the advanced setting pages you'll find a section named something like "Init Display First" with selections for AGP, PCI, and/or whatever kind of available video slot connections your motherboard may have. To use an AGP card for the default monitor, select AGP and save your BIOS settings. To use a PCI card for the startup monitor, select PCI and save your BIOS settings.

Your BIOS may allow selecting a specific slot for the default monitor output. If not, then the system will startup with the first video card of the type specified in your BIOS it finds. Most multi PCI slot motherboards have an AGP slot to one side of the PCI slots, and the search will usually start from the AGP slot and move towards the farthest removed PCI slot. Note this means with PCI selected for intial video output in your BIOS, a PCI video card does not have to be placed in the first slot next to the AGP slot, but, if one is there, it is the one that will be selected for video output.

If you have trouble finding the correct section for initial video card type in your BIOS settings, you can just keep moving the video cable from card to card and rebooting. One of them will work (or you have real hardware problems).

Interestingly enough, the BIOS default setting for many systems is PCI, not AGP. So, even though you may have been working fine with your monitor cabled to a single AGP card, if you plug in a PCI card, leaving the monitor connected to your AGP card, then, on reboot, your AGP card will no longer be selected after bootup, and you'll get no video display. Just something to keep in mind.

Somewhere to Run :



yammer-yammer-yammer...bringing up hardware with live Linux distribution CD...



With the hardware assembled and booting from, say, my personal favorite, a Knoppix live Linux distribution CD, it's time to start installing something on the hard drive. Pretty much have to start with an operating system. Linux, of course. And, when the time came on this project there was this cover disc DVD from some magazine or another laying handily near by with Fedora Core version 4 ready to go. So, decision made, FC4 on the GPUMeister it is.

This is an example reference tag [1]



[xorg.conf]

[IMAGE TEXTURE] [IMAGE TEXTURE] [IMAGE TEXTURE] [IMAGE TEXTURE] [IMAGE TEXTURE]

Virtually Done:

There are a number of options for how to get parallel GPUs to talk to each other. Employing one or another of the available message passing systems is most common. That could be using sockets with, for example, multithreaded UNIX interprocess communications (IPC), or, becoming more popular, TCP (the communications protocol closely linked with the internet, and commonly seen expressed in the acronym TCP/IP).

Both these methods work well, but require specialized drivers for the individual GPU cards. That is one reason why networked cluster systems are popular. In a cluster, each processor is a completely separate entity, and, hence, all can have the same basic control software, rather than requiring a complex driver set as when copocessors are on the same hardware bus.

But, the current wave of the future is virtual processors. That is, running multiple operating systems, simultaneously, on the same hardware. The different operating systems run on virtual hardware that is really separate software packages emulating the actions of a hardware system. In fact, several completely different type processors other than the real hardware processor the virtualized systems are running on can be setup in this way. The separate operating systems don't know the difference, and the underlying hardware doesn't care.

In the case of virtual copies of one kind of system, though they are all running on the same underlying hardware, each virtual processor can use identical copies of drivers to access peripheral devices, such as GPU cards. The virtualizing software keeps everything separate. This is the approach we will be taking with the GPUMeister. As far as the GPUs are concerned, they will be running on separate hardware systems, and interprocessor communications will be as though they are in a networked cluster.

Beowulf in a Box:

...seems sort of like a good way to describe it for now anyways...

Who's Got Gas?


yada-yeda-yada...

Lattice-Boltzmann is itself an out growth of cellular automata techniques, and, as some of you are probably aware, consists of a model of particles moving around on a fixed grid following some simple conservation rules regarding collisions at the grid junction points. What many people are not aware of is that in the limit as grid and time steps approach zero, the Lattice-Boltzmann simulation collapses into the Navier-Stokes equations of fluid mechanics for an incompressible fluid. So, equations nearly impossible to solve analytically outside of very special cases (and pretty horrible in DNS for that matter) become tractable as a game of nano-billiards. And, that, I think, is not just useful, but way cool!

yada-yada-yada...


References:

(Clicking reference numbers here returns you to the text you came from.)


Last updated 06Sept2006
Alan Swithenbank, alans@cuervo.stanford.edu