The $662.00 Supercomputer Page
A $662.00 "Supercomputer"...?
The GPUMeister...one CPU, five GPUs, 12 fans...uses 'em all!
(Click on thumbnails to view larger images.)
NOTE: My most recent pages are back over at
www.stanford.edu/~hydrobay again.
Click here for my boundary layer turbine page.
Click here for my poly-phase motor/generator page.
Click here for my single-phase motor/generator page.
Click here for my solar powered fluid mechanics lab page.
Click here for my 3-axis CNC machine page.
Home
Contents:
The Real Story:
Most people are surprised to find, given what they see on my webpages, that I'm actually a
physical oceanographer. But, that means lots of data collection and lots of calculations.
Those things lead to electronics and computers, and, the rest, as they say, is history.
Another way to state it is virtually all the weird things I do always have some connection
to marine or environmental science, or both. In fact, while I am employed by
Stanford University,
performing large amounts of computer system administration, hardware development, and
data analysis and database programming, I actually do most of it at
Hopkins Marine Station,
in Pacific Grove, California; working for the
TOPP (Tagging of Pacific
Pelagics) program.
This project, the GPUMeister, is as well connected to marine and environmental science.
Yes, it is part of my ongoing, interrelated set of projects which center around fiddling
with model Tesla Turbines and permanent magnet motor/generators. But, those, too, have
the same kind of connections to marine science, (well, fluid mechanics, anyways), and
environmental science, (well, at least a modest potential for some alternative energy
applications). My excuse for building the GPUMeister is I need, (well, want), a machine
well equipped for turbine flow modeling, without spending a small fortune to get one.
(...OK, so it has some potential for becoming a decent games machine...)
As more and more people are finding out these days, the GPU (Graphics Processing Unit)
found on modern video cards is well suited to many kinds of scientific computations,
particularly those encountered in physics and fluid mechanics. With a little shopping
around relatively inexpensive, but reasonably powerful GPU cards can be found. Plugging
in several of them at once can make for a fairly powerful parallel processing system with
minimum expense. And, hence, the GPUMeister project was born.
Parts is Parts:
Basic Stuff:
The biggest criterion for building a system such as the GPUMeister is finding a decent,
affordable motherboard with enough slots to plug in a reasonable number of GPU cards.
In keeping with the keeping it cheap theme, I elected to use standard PCI slot cards,
rather than AGP or the newer PCI-e bus cards. At the time I started this project,
(mid 2006), standard PCI slot video cards were becoming a little harder to find than
their AGP and PCI-e counterparts. None-the-less, standard PCI cards tend to be less
expensive than AGP and PCI-e cards for the same on card GPU and memory. That is also true
of motherboards with multiple standard PCI slots; a little harder to find, but tending to
be available among the lower cost, while still capable units.
One can argue that a faster parallel system could be produced with the GPUs running on
an AGP or PCI-e bus. Sure! But, in fact, for many kinds of computations, including the
types we'll be discussing here, interprocessor communications is not the biggest
bottleneck for computations. So, I'm going cheap with PCI! You can do what you want...;^)
There are a number of vendor options for low-cost, multiple PCI slot motherboards. But,
of course, to pick a motherboard, you first need to pick a CPU vendor, which, for the time
being, pretty much means either Intel or AMD. Now, a very significant issue for a parallel
system such as the GPUMeister is power consumption. (Particularly significant if you happen to
live in the same room as your up and coming personal supercomputer.) Since the system
CPU is not going to do the bulk of the work, it doesn't have to be a fire breathing
monster. Also, you probably truly don't need more than one. Adhering to those
considerations will help reduce power consumption. But, reasonable CPU capability will
still be a plus. For this project, selection of the best of all worlds CPU, considering
cost, capability, power consumption, and being able to plug into an adequately slotted
motherboard came down to a boxed (i.e., with fan preattached), socket 754, AMD
Sempron-64 2600+.
With the CPU decision made, final motherboard selection came down to an ABIT model
NF8-V, which has one 8x AGP video slot, and 5 standard PCI slots. This board is
inexpensive, but capable, even receiving a few small nods from the overclocker crowd.
Overclocking isn't an issue for this project, but, overclocking capability is an
indication of stability. Stability is good.
Besides a motherboard and CPU, the usual suspects needed to be rounded up, including:
case, memory, hard drive, and CD/DVD burner. Of those, the case is perhaps the most
critical. You need something with enough power available to run multiple graphics
cards, and with cooling sufficient to keep everything from frying. I ended up with
a Scorpio RAIDMAX&trade gamer style case. It looks cool. But, that isn't why I chose it. I
chose it because it is cool. Besides the two fans in its 480 W power supply, it has two
fans front, 2 rear, and one in the transparent side panel. Add to those the CPU fan, and
that's eight to keep things heat happy. And, it's still whisper quiet. (The PCI slot
graphics cards have on board fans, bringing the final system configuration fan total
to twelve. The card fans are a tad more noisy than the case and CPU fans, but the
system is still very quiet.) For system memory, the GPUMeister has one 1 GB stick of
DDR/266 RAM. It's hard drive is a 80 GB, 7200 rpm Maxtor (6Y080LO). An Emprex 16x dual-layer
DVD burner rounds out the basic package. In all, with or without addition GPUs, a relatively
inexpensive, but still respectable system.
Tying it All Together:
For most intents, though not actual purpose, the GPUMeister is just a multihead video
system. As such, each GPU card, whether the system video AGP card, or one of the PCI
slot cards, can be used for video display. In fact, programming them individually for
video display before moving on to computation is a good way to test everything works.
In addition to the usual system AGP video card, the GPUMeister has four PCI slot video
cards. Clearly, installing five monitors for test purposes would pretty much defeat the
keeping it cheap theme (not to mention take up far more than the available real estate).
However, needing to crawl behind the desk to swap video cables for every test wasn't
exactly appealing, either. To solve that problem I purchased two inexpensive HD-15
female jack mechanical switch boxes, and 6 six-foot, male-to-male, HD-15 cables. One
box has two positions (A/B), and the other has 4 positions (A/B/C/D).
The two-position switch box common jack is cabled to the system monitor, with its A
jack cabled to the AGP card video out, and its B jack is cabled to the common jack
of the four-position switch box. Thus, the A setting of the 2-position box sends
video output from the AGP card to the monitor, while its B position directs video
output from the 4-position box to the monitor. The A,B,C,D jacks of the four-position
switch box are connected to the 4 PCI slot video cards, from top to bottom in the
case, respectively. Thus, with the two-position box at its B setting, the 4-position
box will direct video from the selected PCI slot video card to the monitor.
I did look for an HD-15 switch box with more than 4 connectors, so that the setup
would only require one box. But, mechanical types with more than four positions don't
seem to be readily available, and the available electronic switches are far too
expensive for this project.
Yes. I do switch video connections with the system powered up. Both switch boxes, (like
most all mechanical data switches), are break-before-make types. So, they won't short
anything out, regardless of being powered up or not. You can worry about blowing
something up if you want to. I won't...;^)
A piece of double-sided foam picture-mounting tape on the feet of the switch boxes
keeps the cables from pulling them off the top of the case. The item you can see cabled
in below the PCI slot video cards in some of the photos is a small USB 2.0 hub.
The Heart of the Matter:
The heart of the matter? Why, the GPU, of course! At this time the major GPU chip
providers are ATI, and
nVIDIA. Both companies
produce excellent processors. I went with nVIDIA for the GPUMeister because
of their Linux support.
ATI does seem to be coming around, but, at this time, they are far behind nVIDIA
in Linux support. Plug in any model of nVIDIA graphics card into a Linux system,
power up, install their "unidriver" and it works. Power down, swap in another model
nVIDIA graphics card, power up, and it still works. Sweet! (You didn't actually
wonder if I run Linux, did you?...;^)...)
Initially I set up the GPUMeister with model MX4000 graphics cards. Later I switched
to FX5200 version cards. Some of the reasoning for the particular models of GPU card
I selected may be a bit hard to follow before we discuss how a GPU can be programmed
for computation. But, briefly, the MX4000 unit, by date of production, is a fourth
generation system (near the latest there is at this writing), but, functionally, it is
more of a third generation product. Specifically, it has good shader unit access, but
very limited fragment processor access (we'll talk about that in more detail later). For
purposes of the type of computations we'll be looking at, (the Lattice-Boltzmann technique),
that really isn't a big problem, because the methods map well onto the shader unit.
Of course, not being a problem is not necessarily as much fun as it could be. With
good access to the fragment processor, you can, using texturing techniques, make
pretty pictures that are much more difficult to produce otherwise. And, after all, if
you're fiddling with graphics cards, why wouldn't you want to make pretty pictures?
The FX5200 is probably the lowest-end unit that has reasonable fragment processor
access. I might have gone with a bit higher-end GPU, but, I was digging through
my boxes of junk searching for something else, (an air-amplifier to use in a new
nozzle for my boundary-layer turbine), and came across an FX5200 based AGP card. I
guess it was just fate. (And, I did find the air-amplifier, too. Check the turbine
page, it may have shown up there by now.)
Cg is the GPU programming language we will be using. The FX5200 based boards are capable
of running all the example programs the come with the
nVIDIA Cg Toolkit without modification. That makes the FX5200 more than
suitable for our purposes. The MX4000 series cards can be made to run most of the
examples with some modifications related to accessing the fragment processor.
The 1.5.Beta 2 version of the Cg Toolkit was installed on the GPUMeister. We'll
discuss Cg programming in a later section.
The Bottom Line:
So how much was it?
GPUMeister Component Costs Rounded to the Nearest Dollar
Gamer Style Tower Case | $69.00 |
Boxed 64-Bit CPU | $65.00 |
PCI Slot Motherboard | $47.00 |
1 GB DDR Memory | $78.00 |
80 GB UDMA Hard Drive | $70.00 |
DL DVD Burner | $40.00 |
AGP Video Card | $60.00 |
PCI Video Card (x4) | $200.00 |
2 Position Data Switch | $5.00 |
4 Postiion Data Switch | $10.00 |
HD-15 Cable (x6) | $18.00 |
TOTAL: | $662.00 |
OK. The GPUMeister system isn't, and never will be, the most flashy "supercomputer" in town.
But, it was just $662.00 to build. And, you must admit, it looks really spiffy in the dark!
System Setup:
Nothing to See Here:
Even in you have an already working, single video card system, if you plug a bunch more video cards
into it, don't be suprised if you don't see anything on you monitor anymore when you reboot. This may
happen though your system seems to otherwise boot fine, and you even see the initial startup spew on
your normal video output.
With one video card, your system can figure out where it is and how to use it. With more than one card,
that ain't necessarily so. You may need to change a BIOS setting to tell it which card to use for the
default video monitor output.
If you have a missing video problem, then reset the system and go to your BIOS setup. Somewhere in the
advanced setting pages you'll find a section named something like "Init Display First" with selections
for AGP, PCI, and/or whatever kind of available video slot connections your motherboard may have. To use
an AGP card for the default monitor, select AGP and save your BIOS settings. To use a PCI card for the
startup monitor, select PCI and save your BIOS settings.
Your BIOS may allow selecting a specific slot for the default monitor output. If not, then the system
will startup with the first video card of the type specified in your BIOS it finds. Most multi PCI slot
motherboards have an AGP slot to one side of the PCI slots, and the search will usually start from the
AGP slot and move towards the farthest removed PCI slot. Note this means with PCI selected for intial
video output in your BIOS, a PCI video card does not have to be placed in the first slot next to the
AGP slot, but, if one is there, it is the one that will be selected for video output.
If you have trouble finding the correct section for initial video card type in your BIOS settings, you
can just keep moving the video cable from card to card and rebooting. One of them will work (or you
have real hardware problems).
Interestingly enough, the BIOS default setting for many systems is PCI, not AGP. So, even though you
may have been working fine with your monitor cabled to a single AGP card, if you plug in a PCI card,
leaving the monitor connected to your AGP card, then, on reboot, your AGP card will no longer be
selected after bootup, and you'll get no video display. Just something to keep in mind.
Somewhere to Run :
yammer-yammer-yammer...bringing up hardware with live Linux distribution CD...
With the hardware assembled and booting from, say, my personal favorite, a
Knoppix live Linux distribution CD, it's time
to start installing something on the hard drive. Pretty much have to start with an operating system.
Linux, of course. And, when the time came on this project there was this cover disc DVD from some
magazine or another laying handily near by with
Fedora Core version 4 ready to go. So, decision made, FC4 on the GPUMeister it is.
This is an example reference tag [1]
Virtually Done:
There are a number of options for how to get parallel GPUs to talk to each other.
Employing one or another of the available message passing systems is most common.
That could be using sockets with, for example, multithreaded UNIX interprocess
communications (IPC), or, becoming more popular, TCP (the communications protocol
closely linked with the internet, and commonly seen expressed in the acronym TCP/IP).
Both these methods work well, but require specialized drivers for the individual GPU
cards. That is one reason why networked cluster systems are popular. In a cluster,
each processor is a completely separate entity, and, hence, all can have the same
basic control software, rather than requiring a complex driver set as when copocessors
are on the same hardware bus.
But, the current wave of the future is virtual processors. That is, running multiple
operating systems, simultaneously, on the same hardware. The different operating
systems run on virtual hardware that is really separate software packages emulating
the actions of a hardware system. In fact, several completely different type
processors other than the real hardware processor the virtualized systems are
running on can be setup in this way. The separate operating systems don't know the
difference, and the underlying hardware doesn't care.
In the case of virtual copies of one kind of system, though they are all running
on the same underlying hardware, each virtual processor can use identical copies
of drivers to access peripheral devices, such as GPU cards. The virtualizing software
keeps everything separate. This is the approach we will be taking with the GPUMeister.
As far as the GPUs are concerned, they will be running on separate hardware systems,
and interprocessor communications will be as though they are in a networked cluster.
Beowulf in a Box:
...seems sort of like a good way to describe it for now anyways...
Who's Got Gas?
yada-yeda-yada...
Lattice-Boltzmann is itself an out growth of cellular automata techniques, and, as some of
you are probably aware, consists of a model of particles moving around on a fixed grid
following some simple conservation rules regarding collisions at the grid junction points.
What many people are not aware of is that in the limit as grid and time steps approach zero,
the Lattice-Boltzmann simulation collapses into the Navier-Stokes equations of fluid
mechanics for an incompressible fluid. So, equations nearly impossible to solve analytically
outside of very special cases (and pretty horrible in DNS for that matter) become tractable
as a game of nano-billiards. And, that, I think, is not just useful, but way cool!
yada-yada-yada...
References:
(Clicking reference numbers here returns you to the text you came from.)
[1] This is the example reference tag reference
Last updated 06Sept2006
Alan Swithenbank, alans@cuervo.stanford.edu