TestingServerHardware
Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/testing-server-hardware
Created: Date(2005-10-31T22:27:51Z) by MarkRamm
Contributors: MarkRamm, AdamConrad, IvanKrstic, MalcolmYates, BenCollins
Packages affected: debian-installer, server-testsuite, stress, bonnie++, iperf
Summary
We need to do a better job of testing Ubuntu on server hardware. To do this we need to:
- get up-to date hardware from IBM / HP / Sun / Apple / Dell to certify against Ubuntu Server 6.04
- create a comprehensive server test suite for hardware recognition and stress testing
- create an easy way to support and encourage community server testing for extra bug reports.
Rationale
We need to guarantee that stuff works !!
Use cases
- Sophia wants to help test Dapper Drake on her server hardware.
- Jeff has some obscure raid hardware on his machine, and he wants to look on the Web to see if it will work with Dapper Drake.
- Roberta wants to buy a bunch of high end servers that are certified to work with Dapper.
Scope
We would like to certify on 25 machines from the above companies in the Dapper timeframe, and we would like to have community testing of as many server configurations as possible. We need a way to record and track user reports that is seprate from the way we do official server certification.
We would also like to have a large number of people testing their own server hardware and reporting the results back to Canonical.
Implementation
We may want to produce install CDs tailored to specific "certified" hardware. These CDs would be available for purchase by enterprise customers.
{{{XXX: mpt: MarkShuttleworth says: "Canonical has signed public undertakings with government offices to the extent that it will never introduce a 'commercial' version of Ubuntu. There will never be a difference between the 'commercial' product and the 'free' product, as there is with Red Hat (RHEL and Fedora). Ubuntu releases will always be free." This seems to conflict with the above.}}}
We will post the results of user testing to the internet so end users can see the results that other have found.
We need to investigate the details of using the single Harvard site for testing and certification certs. We will need infrastructure to connect to these servers remotely, so we can use them for testing.
XXX: mpt: Nit: "The single Harvard site" should be explained here, not later.
OliverGrawert will be consulted about integrating community server test results into hwdb.
Code
Create a server test suite that runs in debian-installer rescue mode. This suite will ask people to answer a few questions (which hardware does the system really have, vs. what the system thinks it has), and email the result to us.
The test suite should contain:
- basic hardware recognition
- userspace tools for hardware configuration (raid controllers, etc)
- hot plug systems for blades (CPU, memory)
- performance tools (memory bandwidth, disk, CPU, whatever)
- database workload tools or equivalent for high-level measurements
- burn-in, long term work load testing for stability
- multi-system option for network testing (throughput, make sure that the network adapter doesn't go belly-up under load)
- an install test, to ensure the system can actually run.
Possible tools:
- stress(1), package 'stress': tests I/O, CPU, memory
- Bonnie for storage testing
- iperf or netpipe-tcp for network stuff: evaluate the two
- module PCI tables for checking hardware support (needs code)
- stuff used for the test suite will need to produce udebs for use in d-i rescue mode.
The test suite might take up to a week to run full burn-in tests, but will need to have a quicker test mode if people are going to test their own hardware and provide us with results.
Each test will run serially, with a final full stress test of all systems.
The inculuded test tools will need a very simple UI that will step through the tests and output the results in a format that can be reviewed by the testing team. Test suite should be automated, and allow for the test operator to perform some out-of-band tests to get details on failures. It should also provide an easy method of shipping the results back to Canonical.
Timeframe
It would be at least 6-8 weeks before hardware could start shipping from vendors to the hypothetical test centre. This means the server test suite needs to be completed by the end of the year.
Outstanding issues
It is difficult to reconcile Canonical's globally distributed development environment with hardware vendors' desire to ship hardware to a single location for testing purposes. But Harvard Computer Society has offered to support a testing facility, including staff to process inventory, run the testing suite, and catalog results. We would also have external IP access to the machines, plus external remote console for the systems (most/all of them?) that support ILO/LOM, etc.
Contact: Ivan Krstic <krstic@hcs.harvard.edu> ('krstic' on Launchpad)
Some hardware configurations may require non-distributable software to support (e.g. RAID). Malcolm will need to talk to vendors about being able to distribute those tools as packages in Ubuntu. In the cases where those tools are undistributable (which is the case with many of them), Malcolm will be petitioning vendors to spondor us creating custom CD images for their own distribution with their hardware.
BoF agenda and discussion
A different BOF needs to be scheduled to discuss improving/changing debian-installer's rescue mode to make server testing and rescue stuff less painful.