Diff for "TestingServerHardware"

TestingServerHardware

Differences between revisions 9 and 12 (spanning 3 versions)

Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/testing-server-hardware
Created: Date(2005-10-31T22:27:51Z) by MarkRamm
Contributors: IvanKrstic, MarkRamm, AdamConrad, MalcolmYates, BenCollins
Packages affected: debian-installer, server-testsuite, stress, iperf

Summary

We need to do a better job of testing Ubuntu on server hardware. To do this we need to:

get up-to date hardware from major server hardware vendors to certify against Ubuntu Server 6.04
set up a central, official certification facility that performs burn-in and installation testing
create a comprehensive server test suite for hardware recognition and stress testing
create an easy way to support and encourage community server testing for extra bug reports.

We will cover official Dapper server certification in this spec. Community server testing has moved to CommunityServerHardwareTesting, as per MattZimmerman's request.

Rationale

We're putting a lot of effort into making Dapper rock on servers. Being an enterprise-ready release, we'll be supporting it for five years on servers -- but none of this is much good if we can't guarantee it will run properly on modern server hardware.

Use cases

Company alpha runs all their servers on Ubuntu. They're buying a batch of new servers, and want to make sure they're certified to work with Dapper.
Company beta is considering switching their data center to Ubuntu. They want to know how much of their hardware is certified to work with Dapper, to gauge the complexity and affordability of the switch.

Community testing use cases are addressed in the community testing spec.

Scope

We would like to certify a minimum of 25 servers in the Dapper timeframe.

Implementation

Certification facility

The Harvard Computer Society will run the central Ubuntu certification facility in Cambridge, MA. HCS will:

provide rackspace for the servers,
provide staff to process inventory,
run the testing suite (both installation and burn-in),
develop and host an Ubuntu-branded server hardware catalogue, both for certified and community-tested hardware, before Dapper is released
provide VPN access to servers under certification (and their lights-out systems such as iLO and LOM) to appointed Ubuntu developers

Following testing, the servers will be tasked to do non-critical functions for Ubuntu and HCS, such as providing an Ubuntu archive mirror, or web serving. These services can be easily shut down when Ubuntu developers need to make use of a server to troubleshoot problems.

IvanKrstic will run the certification facility.

Installation testing

Installation testing does not require developing any new software. Certification facility staff will plop in an Ubuntu-server CD, watch the installation through completion, and make sure the machine was installed properly after rebooting.

We eventually want to have a d-i rescue mode profile for server testing. Booting into it would ask people to answer a few questions (which hardware does the system really have, vs. what the system detected automatically), and deliver the result to us. At first, knowledgeable testing facility staff can perform this by hand and a few custom-written scripts.

Burn-in testing

We will have a minimal, easily developed burn-in test suite for Dapper.

It will contain:

stress(1), package 'stress': I/O, CPU, VM, disk
NLANR's iperf(1), package 'iperf': network stress
a UI to run the tests (described below)

A certification burn-in run will be structured as follows:

Day 1: I/O burn-in with stress(1)
Day 2: CPU burn-in with stress(1)
Day 3: VM burn-in with stress(1)
Day 4: disk burn-in with stress(1)
Day 5: network burn-in with iperf(1)
Days 6, 7: full stress on all subsystems with iperf(1) and stress(1)

Burn-in and installation test runs are collected in the HCS-developed server hardware testing catalog. Use of the catalog for community testing is explained in CommunityServerHardwareTesting.

Load testing UI

The test suite is wrapped in a shiny ncurses UI that, when started, asks the user whether she wants to perform a full burn-in (7 days) or a micro-burn-in (7 hours). A 7 hour micro-burn-in is assumed to be acceptable for community testing, and runs on the same schedule as a certification burn-in, with days scaled to hours.

The UI would run iperf(1) and stress(1) in verbose logging mode, and after a completed burn-in run, would offer to upload the results to the community server hardware testing catalogue via HTTP. The official certification facility would cancel this upload, and upload the logs to the certified server hardware catalogue manually.

Because a failed burn-in test often freezes or reboots the machine, the application needs a way to keep test checkpoints. It should write out a checkpoint to disk every 1 hour and at the completion of every test (which resets the timer). The checkpoint file will only be appended to, and so will contain record of any restarted runs; this checkpoint file will also be uploaded to the catalog, which will parse it to see if any tests failed.

The application needs to read the check-point file, if it exists, at every start: if it determines a test was interrupted, it should offer to start from the interrupted test instead of starting from scratch. A user-interupted test will be specifically mentioned in the checkpoint file, such that it can be differentiated from unexpected test interruptions due to machine reboots or freezes.

Custom install CDs

We may want to produce install CDs tailored to specific certified hardware. Vendors would pay for the creation of these CDs, possibly as part of the certification process, which would then be available to customers for free. We would base the customized CDs on the hardware list and testing results obtained from our official server certification run.

Timeframe

It would be at least 6-8 weeks before hardware could start shipping from vendors to the HCS certification facility. This means the server test suite needs to be completed by the end of the year.

HCS can start receiving and processing hardware in approximately as many weeks. However, actual certification runs can't start before February 1st, 2006. The gap is a one-time setup cost, and will not exist for future releases. This leaves two months for server certification, and since a full certification run (including burn-in testing) takes one week, two months should be more than enough time.

Outstanding issues

Some hardware configurations may require non-distributable software to support (e.g. RAID). Malcolm will need to talk to vendors about being able to distribute those tools as packages in Ubuntu. In the cases where those tools are undistributable (which is the case with many of them), Malcolm will be petitioning vendors to sponsor us creating custom Ubuntu CD images for their hardware.
At what point in the release cycle does it make sense to start certifying hardware? IvanKrstic is inclined to say right after FeatureFreeze. MattZimmerman: thoughts?
iperf(1) understandably wants a client and a server when running network testing. While this is no problem for the certification facility, we have to make sure we have simple instructions available on how to do this for community testing (luckily, doing it is trivial - it requires a connected machine, one apt-get, and one invocation of iperf). The suite UI needs to ask the user up front about the IP of the iperf peer for the network testing, or allow her to skip that part of the test.

TestingServerHardware (last edited 2008-08-06 16:30:16 by localhost)

-  ⇤ ← Revision 9 as of 2005-11-05 18:36:39 → 
  Size: 7738
  Editor: 197_220_103_66-WIFI_HOTSPOTS
  Comment:
+   ← Revision 12 as of 2005-11-11 07:29:30 → ⇥
  Size: 7832
  Editor: 83-131-19-253
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
- * '''Contributors''': MarkRamm, AdamConrad, IvanKrstic, MalcolmYates, BenCollins
+ * '''Contributors''': IvanKrstic, MarkRamm, AdamConrad, MalcolmYates, BenCollins
 Line 7:
-'''This is actively being worked on. Please consult with IvanKrstic before making changes.'''
-Line 13:
+Line 11:
- * get up-to date hardware from IBM / HP / Sun / Apple / Dell to certify against Ubuntu Server 6.04
 * set up a centralized burn-in and installation testing facility
+ * get up-to date hardware from major server hardware vendors to certify against Ubuntu Server 6.04
 * set up a central, official certification facility that performs burn-in and installation testing
-Line 18:
+Line 16:
-Addressing concerns from MattZimmerman, we will address burn-in testing and installation testing separately.
+We will cover official Dapper server certification in this spec. Community server testing has moved to CommunityServerHardwareTesting, as per MattZimmerman's request.
-Line 22:
+Line 20:
-We need to guarantee that our server stuff works!
+We're putting a lot of effort into making Dapper rock on servers. Being an enterprise-ready release, we'll be supporting it for five years on servers -- but none of this is much good if we can't guarantee it will run properly on modern server hardware.
-Line 26:
+Line 24:
- * Sophia wants to help test Dapper Drake on her server hardware.
+ * Company alpha runs all their servers on Ubuntu. They're buying a batch of new servers, and want to make sure they're certified to work with Dapper.
-Line 28:
+Line 26:
- * Jeff has some obscure RAID hardware on his servers, and he wants to look on the Web to see if it will work with Dapper Drake.
+ * Company beta is considering switching their data center to Ubuntu. They want to know how much of their hardware is certified to work with Dapper, to gauge the complexity and affordability of the switch.
-Line 30:
+Line 28:
- * Roberta wants to buy a bunch of high end servers that are certified to work with Dapper.
+Community testing use cases are addressed in the community testing spec.
-Line 34:
+Line 32:
-We would like to certify on 25 machines from the above companies in the Dapper 
timeframe, and we would like to have community testing of as many server 
configurations as possible.  We need a way to record and track user reports 
that is seprate from the way we do official server certification. 

We would also like to have a large number of people testing their own server hardware and reporting the results back to Canonical.
+We would like to certify a minimum of 25 servers in the Dapper timeframe.
-Line 43:
+Line 36:
-We may want to produce install CDs tailored to specific "certified" hardware. Vendors would pay for the creation of these CDs, possibly as part of the certification process, which would be available to customers for free. We would base the customized CDs on the hardware list and testing results obtained from our official server test run.
+=== Certification facility ===
-Line 45:
+Line 38:
-User testing results will be posted to the internet so end users can see the results that other have found, with a clear indication that they are not official certification or testing results.
+The Harvard Computer Society will run the central Ubuntu certification facility in Cambridge, MA. HCS will: 
 * provide rackspace for the servers, 
 * provide staff to process inventory, 
 * run the testing suite (both installation and burn-in), 
 * develop and host an Ubuntu-branded server hardware catalogue, both for certified and community-tested hardware, before Dapper is released
 * provide VPN access to servers under certification (and their lights-out systems such as iLO and LOM) to appointed Ubuntu developers
-Line 47:
+Line 45:
-It is difficult to reconcile Canonical's globally distributed development 
environment with hardware vendors' desire to ship hardware to a single location 
for testing purposes. The Harvard Computer Society has offered to support 
a testing facility, including staff to process inventory, run the testing suite, and catalog results.  We would have external VPN or SSH access to the machines, plus external remote console for the systems (most/all of them?) that support lights-out systems like ILO and LOM. The Harvard Computer Society contact is IvanKrstic (krstic@hcs.harvard.edu, krstic on Launchpad.)
+Following testing, the servers will be tasked to do non-critical functions for Ubuntu and HCS, such as providing an Ubuntu archive mirror, or web serving. These services can be easily shut down when Ubuntu developers need to make use of a server to troubleshoot problems.
-Line 52:
+Line 47:
-OliverGrawert will be consulted about integrating community server test results into `hwdb`.
+IvanKrstic will run the certification facility.
-Line 56:
+Line 51:
-Installation testing does not require developing any new software. Testing facility staff will plop in an Ubuntu-server CD, watch the installation through completion, and make sure the machine was installed properly after rebooting.
+Installation testing does not require developing any new software. Certification facility staff will plop in an Ubuntu-server CD, watch the installation through completion, and make sure the machine was installed properly after rebooting.
-Line 58:
+Line 53:
-We eventually want to have a d-i rescue mode profile for server testing. Booting into it would ask people to answer a few questions (which hardware does the system really have, vs. what the system thinks it has), and e-mail or otherwise deliver the result to us. At first, knowledgeable testing facility staff can perform this by hand and a few custom-written scripts.
+We eventually want to have a d-i rescue mode profile for server testing. Booting into it would ask people to answer a few questions (which hardware does the system really have, vs. what the system detected automatically), and deliver the result to us. At first, knowledgeable testing facility staff can perform this by hand and a few custom-written scripts.
-Line 60:
+Line 55:
-=== Code (burn-in testing) ===
+=== Burn-in testing  ===
-Line 62:
+Line 57:
-The burn-in test suite should contain:
 * basic hardware recognition
 * userspace tools for hardware configuration (RAID controllers, etc)
 * hot-plug systems for blades (CPU, memory)
 * performance tools (memory bandwidth, disk, CPU, whatever)
 * database workload tools or equivalent for high-level measurements
 * burn-in, long term work load testing for stability
 * multi-system option for network testing (throughput, make sure that the network adapter doesn't go belly-up under load)
 * an install test, to ensure the system can actually install and run.
+We will have a minimal, easily developed burn-in test suite for Dapper.
-Line 72:
+Line 59:
-'''MattZimmerman: this is a sizeable project, and it's not clear from this spec what would be required to meaningfully test these components.  An implementation plan needs to have enoguh detail for a developer to implement the plan and measure the result against it.
+It will contain:
 * stress(1), package 'stress': I/O, CPU, VM, disk
 * NLANR's iperf(1), package 'iperf': network stress
 * a UI to run the tests (described below)
-Line 74:
+Line 64:
-My suggestion is to start with a test plan, one intended to be read and executed by a human, measure its effectiveness in practice through a plan for community testing, and only then to  invest in automating it.  FormalTestPlans provides an example of how to do this.  We need a list of the series and models of servers that should be tested, as was created for LaptopTesting.  We can then survey the community to find volunteers who have access to the hardware which interests us, and ask them to carry out the documented test plan.'''
+A certification burn-in run will be structured as follows:
-Line 76:
+Line 66:
-Possible tools:
 * stress(1), package 'stress': load-tests drive I/O, CPU, memory
 * iperf or netpipe-tcp for network stuff: evaluate the two
 * module PCI tables for checking hardware support (needs code)
 * stuff used for the test suite will need to produce udebs for use in d-i rescue mode.
+ * Day 1: I/O burn-in with stress(1)
 * Day 2: CPU burn-in with stress(1)
 * Day 3: VM burn-in with stress(1)
 * Day 4: disk burn-in with stress(1)
 * Day 5: network burn-in with iperf(1)
 * Days 6, 7: full stress on all subsystems with iperf(1) and stress(1)
-Line 82:
+Line 73:
-The test suite might take up to a week to run full burn-in tests, but will need to
have a quicker test mode if people are going to test their own hardware and provide us with results.
+Burn-in and installation test runs are collected in the HCS-developed server hardware testing catalog. Use of the catalog for community testing is explained in CommunityServerHardwareTesting.
-Line 85:
+Line 75:
-'''MattZimmerman: A burn-in test should be considered separately; our main objective is to ensure that Ubuntu installs and works on the hardware, because we currently know that it lacks support for several popular server platforms.'''
+=== Load testing UI ===
-Line 87:
+Line 77:
-Each test will run serially, with a final full stress test of all subsystems.
+The test suite is wrapped in a shiny ncurses UI that, when started, asks the user whether she wants to perform a full burn-in (7 days) or a micro-burn-in (7 hours). A 7 hour micro-burn-in is assumed to be acceptable for community testing, and runs on the same schedule as a certification burn-in, with days scaled to hours.
-Line 89:
+Line 79:
-The included test tools will need a very simple UI that will step through the
tests and output the results in a format that can be reviewed by the testing team.
Test suite should be automated, and allow for the test operator to perform some
out-of-band tests to get details on failures.  It should also provide an easy
method of shipping the results back to Canonical.
+The UI would run iperf(1) and stress(1) in verbose logging mode, and after a completed burn-in run, would offer to upload the results to the community server hardware testing catalogue via HTTP. The official certification facility would cancel this upload, and upload the logs to the certified server hardware catalogue manually.
-Line 95:
+Line 81:
-With the test suite in place, IvanKrstic and HCS can develop both the testing UI, and the results collection/display UI if necessary.
+Because a failed burn-in test often freezes or reboots the machine, the application needs a way to keep test checkpoints. It should write out a checkpoint to disk every 1 hour and at the completion of every test (which resets the timer). The checkpoint file will only be appended to, and so will contain record of any restarted runs; this checkpoint file will also be uploaded to the catalog, which will parse it to see if any tests failed.
-Line 97:
+Line 83:
-=== Timeframe ===
+The application needs to read the check-point file, if it exists, at every start: if it determines a test was interrupted, it should offer to start from the interrupted test instead of starting from scratch. A user-interupted test will be specifically mentioned in the checkpoint file, such that it can be differentiated from unexpected test interruptions due to machine reboots or freezes.
-Line 99:
+Line 85:
-It would be at least 6-8 weeks before hardware could start shipping from vendors
to the hypothetical test center.  This means the server test suite needs to be
completed by the end of the year.
+=== Custom install CDs ===
-Line 103:
+Line 87:
-'''MattZimmerman: a formal certification process is helpful, but it isn't clear whether it could be implemented in time to provide the testing we need for Dapper.  A test lab requires local staff resources, etc., to carry out the tests.  Unless there is a specific commitment to do this on a Dapper-friendly timeline, we must implement a community test plan.'''
+We may want to produce install CDs tailored to specific certified hardware. Vendors would pay for the creation of these CDs, possibly as part of the certification process, which would then be available to customers for free. We would base the customized CDs on the hardware list and testing results obtained from our official server certification run.

== Timeframe ==

It would be at least 6-8 weeks before hardware could start shipping from vendors to the HCS certification facility.  This means the server test suite needs to be completed by the end of the year.

HCS can start receiving and processing hardware in approximately as many weeks. However, actual certification runs can't start before February 1st, 2006. The gap is a one-time setup cost, and will not exist for future releases. This leaves two months for server certification, and since a full certification run (including burn-in testing) takes one week, two months should be more than enough time.
-Line 107:
+Line 97:
-Some hardware configurations may require non-distributable software to support 
(e.g. RAID).  Malcolm will need to talk to vendors about being able to distribute 
those tools as packages in Ubuntu.  In the cases where those tools are 
undistributable (which is the case with many of them), Malcolm will be 
petitioning vendors to sponsor us creating custom Ubuntu CD images for their hardware.
+ * Some hardware configurations may require non-distributable software to support (e.g. RAID).  Malcolm will need to talk to vendors about being able to distribute those tools as packages in Ubuntu.  In the cases where those tools are undistributable (which is the case with many of them), Malcolm will be petitioning vendors to sponsor us creating custom Ubuntu CD images for their hardware.
-Line 113:
+Line 99:
-== BoF agenda and discussion ==

A different BOF needs to be scheduled to discuss improving/changing `debian-installer`'s rescue mode to make server testing and rescue stuff less painful.
+ * At what point in the release cycle does it make sense to start certifying hardware? IvanKrstic is inclined to say right after FeatureFreeze. MattZimmerman: thoughts?
 
 * iperf(1) understandably wants a client and a server when running network testing. While this is no problem for the certification facility, we have to make sure we have simple instructions available on how to do this for community testing (luckily, doing it is trivial - it requires a connected machine, one apt-get, and one invocation of iperf). The suite UI needs to ask the user up front about the IP of the iperf peer for the network testing, or allow her to skip that part of the test.

Ubuntu Wiki