Datacenter Power. It seems you can never have enough.
We have our colocation inside an Equinix IBX. It is an excellent facility. Unfortunately, about 2 years ago, our cage got a new neighbor. They have added rack after rack of new servers to accommodate their ever increasing traffic. Which means they have effectively used up all the allocated power feeds for our section of the colo.
So as we started to fill our own cabinets, we found that we were quickly using up the 2 x 20A 110V feeds they had allocated to each of our cabinets. Our partner in colocation, sell.com was also at this time upgrading their farm to the latest dual xeon models. These boxes were pulling a LOT more amps than the previous P3 generation.
Very quickly, we became experts on how much amperage we could squeeze out of our existing feeds and what systems required how much power.
Here are some anecdotal amperage readings we took from our fancy amp reading tool. Please note these readings are all taken with 120V feeds.
SuperMicro SuperServer 1U Half-depth server with X7DCA-L motherboard.
- CPU: 2 x Quad Core 2.5GHz Xeon L5420
- RAM: 16GB PC2-5300 ECC Registered
- 1 x 120GB Intel 330 SSD
- Single 280W Power supply
- Off – 0.06A
- Cold boot spike: 1.16A
- Booted/idle – 0.78A
- Debian Squeeze installation – 1.04A
- CPU Heavy Ops: – 1.13A
- 9 threads of “cat /dev/urandom | md5sum > /dev/null”
Dell PowerEdge 1950 1U
- CPU: 2 x Dual Core 2.66GHz Xeon 5150
- RAM: 16GB PC2-5300 Fully Buffered ECC DIMM
- 1x 120GB Intel 330 SSD on PCIe card, 2x 1TB SATA Hard Drive (7200 RPM)
- Dual 670W Power supplies
- Off – 0.33A
- Cold boot spike: 3.33A
SuperMicro SC815TQ-R700UB 1U server with X8DTU-F motherboard.
- CPU: 2 x Hex Core 2.4GHz Xeon E5645
- RAM: 48GB PC3-10600 ECC Registered
- 1x 60GB Intel 330 SSD, 4 x 600GB WD 10k rpm Velociraptor SATA Hard Drive (7200 RPM)
- 1x OCZ RevoDrive3x2 PCIe 240GB SSD
- Dual 650W Power supplies
- Off – 0.23A
- Cold boot spike: 2.60A
- Memtest scan – 2.22A
- Heavy CPU – 2.58A
SuperMicro SC813 1U Half-depth server with X7DCA-L motherboard.
- CPU: 2 x Quad Core 2.5GHz Xeon L5420
- RAM: 24GB PC2-5300 ECC Registered
- 1x64GB Intel SSD, 4 x 1TB SATA Hard Drive (7200 RPM)
- Single 280/340 Power supply
- Off – 0.13A
- Cold boot spike: 1.91A
- Booted/idle – 1.13A
- Rebuild single drive from RAID 6 set built with the 4 drives. (3 reading, 1 being written to) – 1.31A
- Same op as above plus: – 1.54A
- 9 threads of “cat /dev/urandom > /dev/null”
- 1 thread of “cat /dev/urandom > /tmp/ramdisk” (16GB tmpfs ramdisk)
- Same set of ops as above except “dd bs=1M if=/dev/zero of=/tmp/ramdisk count=16000″ – 1.61A
(FYI, this operation of writing zeros to a ramdisk resulted in a dd stat of 757MB/sec)
Dell PowerEdge 2950
- CPU: 2 x Quad Core 2.33GHz Xeon E5345
- RAM: 32GB PC2-5300 ECC Registered
- 1x40GB Intel SSD, 6 x 1TB SATA Hard Drive (7200 RPM)
- Dual Power supplies
- Off – 0.25A to 0.37A
- Cold boot – 2.83A (16GB RAM) – 3.12A (32GB RAM)
- Warm boot – 2.94A
Dell PowerEdge 2850
Specs: Dual Xeon 3.6GHz/2MB; 2GB RAM; 6 x 73 GB SCSI Hard Drive (10K RPM); Dual Power supplies
Specs: Dual Xeon 3.6GHz/2MB; 16GB RAM; 1 x 40GB Intel SSD, 6 x 300 GB SCSI Hard Drive (10K RPM); Dual Power supplies
Dell PowerEdge 1750
Specs: Dual Xeon 3.2GHz – 4GB RAM; 3 x 146 GB 10K rpm SCSI Hard Drives; Dual Power supplies
Software: Debian 5.0 – MySQL 5.0 – InnoDB heavy
- Off – 0.21A
- Cold Start – 3.00A Peak
- Nominal usage – 1.90A
Dell PowerEdge 1650
Specs: Dual PIII 1.4Ghz; 2GB RAM; 3 x 36GB SCSI 10K rpm; Dual 275W Power supplies
- PS A & B both active
- PS A – 0.7A
- PS A & B
- Nominal operation – 1.41A
- Warm Boot – 1.44A Peak
- Cold Boot (drives spinning up) – 1.56A
- PS A only
- Nominal operation – 1.37A
Apple Power Mac G4
Specs: G4/533 Dual – 1.5GB RAM – 2 x 18GB SCSI (15K rpm)
- Peak Startup – 1.27A
- Max load on SCSI drives – big copy operation – 1.18A
Apple Xserve G4
Specs: Dual 1.0 Ghz G4, 2GB RAM 2x60GB & 2 x 180GB
- heavy cpu/disk load – 1.52A
- simultaneous diskutil zero on all disks (booted from CD)
- Max CPU – multiple threads of cat /dev/urandom > /dev/null & ssh/rsa keygen operations
- all 4 disks idle – 1.37A
- Insert 180GB ADM – peak 1.41A, settled back down to 1.32A
- Insert second 180GB ADM – peak 1.48A, settled down to 1.38A
- keygen and cat large data file generated by /dev/urandom, copied to Software RAID mirror 60GB – spikes to 1.56A
Apple Xserve G5
Specs: Dual 2.0Ghz G5, 3GB RAM, 3 x 80GB SATA
- Nominal operation – 1.8A
- Max Cold Boot – 2.16A
Specs: Dual 2.3Ghz G5, 1GB RAM, 2 x 500GB SATA
- Nominal operation – 1.8A
- Max Cold Boot – 2.07A
Apple dual Quad Core Intel Xserve
Specs: Dual Intel 2.8Ghz Quad Xeon (8 cores), 16GB RAM, 3 x 1TB SATA in RAID 5
- Max Cold Boot – 3.23A
- Nominal operation – 2.80A
- Max cpu, disk activity – 3.68A
- Powered Off – 0.27A
Apple single Quad Core Intel Xserve (Xserve2,1 – Early 2008 model)
Specs: Single Intel 2.8Ghz Quad Xeon, 4GB RAM, 2 x 250GB SATA
- Nominal operation – 2.00
- Powered Off – 0.28A
- Max cpu, disk activity – 2.08 amps
(calculated by adding all “watts” readings in Server Monitor and div by 115V)
Apple Intel Mac Mini
Specs: Intel 1.66Ghz Core Duo, 2GB RAM, 60GB E-Rated Hitachi drive E7K100 model
- Nominal operation – 0.29A
- Max cpu, disk activity – 0.37A
Apple G4 Mac Mini
Specs: 1.33Ghz PowerPC G4, 1GB RAM, no wireless, 32GB Transcend Solid State Disk
- Idle – 0.13A
- Book Peak – 0.26A
- Nominal operation – 0.20A
- Max cpu, disk activity – 0.26A
- With added 1TB SATA laptop – massive dd ops – 0.21A
Apple Xserve RAID (Xraid)
Specs: 7 x 250GB (Hitachi) and 7 x 750GB (Seagate 7200.10)
- Nominal operation – fluctuates around 2.00A
- Max disk activity (as much as I could generate using Xserve G4) – 2.19A
SuperMicro SuperServer (PDSBM-LN2)
Specs: Core 2 Duo 2.2Ghz – Single 200W Power supply – 2GB RAM – 80GB SATA 5400rpm 2.5″ drive
- Cold Boot (drives spinning up) – 0.9A
- heavy cpu/disk load – 0.97A
- Nominal operation – 0.75A max
Specs: Core 2 Quad 2.4Ghz (Q6600) – Single 200W Power supply – 4GB RAM – 80GB SATA 5400rpm 2.5″ drive
- Cold Boot (drives spinning up) – 1.0A
- heavy cpu/disk load – 1.25A
- Nominal operation – 0.95A max
Specs: Dual 833Mhz PIII – Single Power supply – 2 x 18GB SCSI (10K rpm)
- Cold Boot (drives spinning up) – 1.0A
- heavy cpu/disk load – multiple instances of cpuburn and cat’ing /dev/urandom to a file – 0.9A
- Nominal operation – 0.75A max
IBM eServer x330
Specs: Two Intel Pentium III (Coppermine) 864MHz processors, 1GB RAM, Single Power Supply, Single 36GB SCSI drive
- Connecting Power Peak: 0.29A
- Stdby Steady: 0.11A
- Power On Peak: 0.78A
- SCSI spinup: 0.98A
- Powered low load: 0.63A
- Loaded (6.0+ Load Average with disk): 0.80A
- Disk activity only: 0.72 peakA
- Reasonable Load + Disk Activity: 0.79A
- heavy cpu/disk load – multiple instances of cpuburn and cat’ing /dev/urandom to a file – 0.82A
IBM eServer x336
Specs: Dual 3.0Ghz Xeon, 4GB RAM, Dual 575W Power Supplies, Dual 146GB SCSI drives
- Connecting Power Peak: 1.06A
- Stdby Steady: 0.79A
- Power On Peak: 2.5A
- Powered low load: 2.12A
- Loaded (7.0+ with disk): 3.25A
- Disk activity only: 2.40A
- Reasonable Load + Disk Activity: 2.85A peak
- heavy cpu/disk load – multiple instances of cpuburn and cat’ing /dev/urandom to a file – 3.2A
Some pieces of network equipment/drives I’ve tested:
- Cisco CSS 11151 Load Balancing switch – 0.89A
- Cisco CSS 11501 Load Balancer – peak startup: 0.69A – idle: 0.63A
- Cisco 2621 Router – peak startup: 0.14A – idle: 0.13A
- Cisco WS-3548-XL 48 port 10/100 switch – peak startup: 0.87A – idle (no ports connected): 0.61A
- Cisco WS-C2924-XL-EN 24 port 10/100 switch – peak startup: 0.39A – idle (1 port connected): 0.36A
- Cisco 1538M (8 port 10/100 hub) – 0.16A
- Cisco 1601 T1 Router – 0.08A (nothing connected)
- Cisco 2948G Catalyst – Boot peak: 1.0A – idle (no ports connected): 0.78A
- Dell PowerConnect 5224 24 port GigE switch – peak startup: 0.43A – idle (no ports connected): 0.36A
- Dell PowerConnect 3248 48 port 10/100 switch – peak startup: 0.40A – idle (no ports connected): 0.35A
- Dell PowerConnect 3324 24 port 10/100 switch – peak startup: 0.22A – 8 ports connected: 0.2A
- HP ProCurve 2848 (J4904A) – 48 port GigE switch – peak startup 0.69A – idle (no ports connected): ~0.51A
- NetGear FS524 24 port 10/100 switch – peak startup: 0.24A – idle (no ports connected): 0.21A
- Juniper NetScreen 5GT – 0.06A
- NetScreen 10 – 0.06A
- Netopia 3386-ENT – 0.05A
- Adtran CSU/DSU – 0.01A
- BayTech DS2-RPC – 0.05A
- 500GB Seagate SATA drive – Spinup: 0.31A – Duplicate large files – 0.11
- CoolMax NAS – Single SATA drive- Spinup: 0.26A – Duplicate large files – 0.16
- Seagate FreeAgent 1TB USB drive – Spinup: 0.24A – Duplicate large files – 0.11
- 4-Bay v2 Drobo with 2 x 5900 rpm 2TB Seagate drives – Startup – 0.35A
Dave from NetApp has some interesting things to say about power in the datacenter.
Posted by Brian Blood as Colocation, Hardware, Routers and Firewalls, Servers at 6:51 PM UTC
No Comments »
In the course of rebuilding a customer’s Panther Xserve G5 on a 2 drive software raid to Tiger on a 3 drive hardware RAID, we needed to
migrate the data quickly and efficiently. We didn’t need to upgrade the OS, but simply do a fresh install.
What I wanted to do was to install the RAID card, hook two of the drives up to card, leaving one of the main drives connected to the system bus.
I intended to try and create a degraded RAID 5 set with the two drives, then copy the data from the main drive over. Then shutdown, hook up the third drive and have the raid card start to rebuild the array on the fly.
This would give me the fastest way of copying over the data from the system.
Alas, it was not to be. the megaraid cli program complained that I didn’t have enough members to create the RAID 5 set:
# megaraid -create R5 -drive 1 2
MEGARAID CLI version 1.0.12
Insufficient Drives 2 for RAID5
INSUFFICIENT/WRONG argument found to complete command
I ended up having to copy the data to both the other server under FWTD, and copying to a connected Firewire disk.
In the end, the RAID 5 device was created with all 3 drives and is running smoothly.
Posted by Brian Blood as OS X Server, Servers at 1:40 PM UTC
No Comments »
A colo customer of ours wanted us to completely rebuild an Xserve G5 of theirs. It was running Panther server and started acting really squirrely. It was setup with an Apple software raid mirror of the drives in Bay 1 and Bay 3. There was an additional drive in Bay 2, but it wasn’t tied to anything.
The plan for rebuilding this box was to backup everything on the system, install the PCI Hardware RAID card, attach the three drives and then do a fresh Tiger Server install.
In the course of determining the best way to back up this box, we had the idea of putting the server into Firewire Target disk mode (FWTD) and attaching it to another server of ours with big fast disks. This turned out to be a pretty good solution, but I was pleasantly surprised by a feature.
We have all G4 Xserves and this G5 Xserve is the lone non-G4 box. So, based on my previous experience of using FWTD mode on G4 Xserves, I expected only the drive in the first of the three Bays to show up on the running server. Interestingly, when we connected the firewire cable, all of the disks including the Install CD in the CD drive of the G5 Xserve showed up on this other box.
Posted by Brian Blood as OS X Server, Servers at 3:52 PM UTC
1 Comment »
We support the Hypersites development team in handling all their colocation and load balancing systems and occasionally doing web application consulting for them to help make their site better, faster, stronger and more agile. The Hypersites Application Builder is truly a marvelous piece of software. You should give it a spin for your next web project.
One of the underlying parts of their architecture that we advised them on long ago was to utilize the compression based encoding that most browsers support to reduce the actual amount of traffic sent over the internet to deliver a page. Another item was to build out versions of the pages that their system created and store those in a caching system of some sort. We had considered using memcache, which is a great way of storing that transient data that most web apps end up creating/using, but they decided on a much simpler (KISS) database table.
In that table are stored 3 versions of a page’s html: plain html, gzip and compress
The team recently made a change in their code so that instead of grabbing all three columns of data from the cache table, then choosing which version of the data to use, they chose which column to select before making the query.
The result: In about 70% of the calls to the cache table, the query result dropped to 10% of it’s original size.
By making a simple change to the logic in their code, they accelerated their software (at least that portion of the code) by TEN FOLD, something which no amount of reasonably-priced hardware upgrades would have accomplished.
Very cool and a good lesson.
Posted by Brian Blood as Database, Web App Development at 11:55 AM UTC
No Comments »
We run cacti to do graphing of a lot of our resources. Obviously, traffic on switches, routers and servers, but also we do application level tracking. Things like # of online users or transactions submitted to a batch system or hits on an Apache web server, etc, etc. This amount of monitoring can cause cacti to get pretty busy. And for a long time, it hasn’t been an issue. Then last July, we really started doing a lot more monitoring for a new application architecture we are hosting for a client. This started to increase the load on our monitoring server quite a bit.
Back in December, you can see a break. This is when I upgraded this box from Panther server to Tiger Server. As you can see the load really jumped. So, I dug into the server to see what was causing the issue. I kept seeing this larger than acceptable load average, and just sitting there watching top, I couldn’t really see any one process taking up as much cpu as was warranted. In top, when you see the total cpu %, you can generally add up the cpu% of the top 4 or 5 processes and get close to what it is telling you the total cpu utilization is. However, top was showing about 45% cpu used and the top 4 processes, were only adding up to about 20%. Where was this extra cpu time going? I then noticed that most of the cpu usage was in the “system” % and I started watching the process list closer. Turns out there was a continuous stream of snmpget processes being launched and completed one right after another. Aha! So cacti is forking off snmpget processes to go and retrieve data from the devices we monitor. No single process was doing much heavy processing but the shear number of forking was causing all this load.
So. What to do with all these forked processes? My first thought was that we are monitoring a lot of SNMP v1 devices and that in SNMP v1 you could only request a single value, but that in SNMP v2, you can make a request for a range of values. So, I went on an upgrading process of either swapping out older v1-only devices for v2 capable or upgrading the IOS on some Cisco switches we run to a version that supports SNMP v2. Interestingly, there is a place on the Cisco site where you can directly download the latest versions of some older Cisco switches, like the venerable 2924XL devices.
So, getting rid of some of the SNMP v1 devices did have some impact in reducing the number of forked processes generated by cacti. You can see that reflected in the drop on the Load Average graph around the end of December. However, this did not solve the issue to the extent I’d hoped as you can see there was still considerable load on the box. So, the only thing left to update was the PHP on the server to a version with the php-snmp functions built-in so that no forking would be necessary. This meant: PHP 5.
I updated the server’s MySQL to MySQL 5.0.x. Then updated the cacti install to the latest version. Then downloaded the PHP 5.2 installer from Mr. Liyange’s site and made the upgrade and making the appropriate changes to the php.ini file.
I had done this same setup for a client before. PHP 5.2, cacti, etc and had had some issues with the built into PHP snmp functions. There is a function in the lib/snmp.php file in cacti called: snmp_get_method($version). It’s purpose is to find the best method of calling SNMP, based on the requested version of SNMP and the availability of certain functions or callable executable. The issue I had had was that when I was using the cacti interface to poll a device for interfaces to graph traffic for, php-snmp would fail and cacti would give a not very helpful snmp error. At that time, I merely added a line that forced cacti to use the snmp binaries. OK, now it was time to really track this down.
The first error I encountered in the php error log was when php was calling snmp2_get(), when I made cacti repoll a switch.
Could not open snmp connection: Unknown host
Which was obviously not correct. The second error message I saw in the base apache error log was:
No support for requested transport domain “udp”
so, I did some googling and found (primarily in the PHP bug tracking site) that php-snmp tended to work fine in the CLI version of PHP (which is what is used when cacti does it’s normal periodic polling), but that gave these errors when called from within the Apache module
I had also turned on debugging in SNMP, by adding:
to the snmp client conf file at /usr/share/snmpd/snmp.conf and watching all the relevant logs. SNMP outputs a LOT of info, and I had already gotten an idea of what the basic problem was, so I turned debugging back off.
OK, the basic issue here was that php-snmp works in PHP cli, but not in Apache. I guess you could call it a hack, but I merely added this line to the top of the snmp_get_method() function:
if (!empty($_SERVER['HTTP_HOST'])) return SNMP_METHOD_BINARY;
which is basically forcing cacti to call the cli version of snmp functions when there is a HTTP Host header, which is only going to be the case when this function is called from within Apache which is only when you are doing configs of your devices and data sources. (a lot of witches, there)All other times, the function continues on and chooses to use the php-snmp built-ins. So after all this debugging and tweaking, what’s the result? The polling process that cacti went through to poll all our devices by forking snmpget calls which used to take up to 3-4 minutes to complete, now takes (with only 2 concurrent poller processes) just under 14 seconds. As a result the load is now down considerably on that box:
And the other things we have running on that server now run much more quickly.
Posted by Brian Blood as OS X Server, Servers at 11:25 AM UTC
2 Comments »