BIOSandUbuntu
Summary
A buggy BIOS can cause many different and subtle problems to Linux. This page describes some of the issues which can cause problems.
Broken DSTD
Sometimes a BIOS will contain a Differentiated System Description Table (DSTD) that is incorrect and will cause problems.
The (DSDT) can be disassembled using the Intel iasl disassembler tool. Using the Intel DSDT disassembler has proved very revealing in some cases. For example, one BIOS had a race condition in the initialisation of the Embedded Controller. The Linux ACPI driver was reading values form the Embedded Controller before the controller was fully ready, and hence incorrect values were being returned to the Linux kernel for various devices, such as the smart battery etc..
Another bug found in the DSDT was from the sloppy use of the ASL Acquire() operator used to obtain a mutex. The Acquire() operator has a timeout argument, which can be 0xFFFF (wait forever to acquire a mutex) or a timeout in milliseconds. For the non-infinite wait there should be a check to see if the Acquire worked correctly or timed-out. I have experience of code that does not check the mutex timeout and hence race conditions have occurred causing the corruption of settings in the embedded controller and misreadings by the Linux ACPI driver.
Rule: Always check return values from mutex Acquire() operators when using finite timeouts.
BIOS checking tools
If you suspect your BIOS is not behaving correctly, then it is worth using the Linux Firmware Kit http://www.linuxfirmwarekit.org/ to automatically check for incorrect BIOS behaviour. This tool may report some false positives, but it is an excellent tool for picking up bugs at an early stage.
Some issues we have found with this are as follows:
1. Incorrectly set thermal trip zone values (ACPI spec 11.1) and Linux /proc/acpi/thermal_zone/THRM
2. Incorrectly set HPET VendorID - defaults of 0xFFFF are not excusable
3. HPET clock period not set, with a default of 0xFFFFFFFF
DSDT handling Operating System Variants
Most BIOS code check OS variants keyed off the _OSI and _OS objects. Most BIOS that invoke OSI(Linux) do nothing with it, but others that do cause Linux to break in different ways (supsend/resume, Video reposting, etc). Linux's ACPI driver disables OSI(Linux) by default, with the hope that it discourage BIOS writers from using it.
Linux will continue to claim OSI compatibility with Windows until the day when the majority of Linux systems have passed a Linux compatibility test rather than a Windows compatibility test.
ACPI _BIF method
Ths ACPI _BIF method provides the kernel with battery specific information. Hence if it is incorrect it can fool applications such as gnome-power-manager to shut a system down when it believes power has reached a critically low point.
A broken _BIF method has been known to caused some power management headaches. For example, it is important that the "Design Capacity of Warning" and "Design Capacity of Low" are non-zero, otherwise the gnome-power-manager cannot easily determine a correct strategy for hibernating or shutting down a PC when the battery becomes critically low.
Generally, packages should have all their fields set without relying on any defaults to zero. I consider it sloppy BIOS practice to omit the setting of fields in packages - fields need to be set - missing fields lead to bugs which can fool the kernel or applications that rely on specific ACPI information being provided correctly.
Reboot Methods
A PC can be rebooted by Linux using several different strategies, selectable bu the kernel reboot= boot option. These strategies are:
1. Putting the processor back into real mode and jumping to the BIOS reset address 2. Keyboard controller reset by writing 0xfe to port 0x64 3. Forcing the processor to triple fault 4. By forcing a Intel PCI reset by writing 0x2 and then 0x04 to port 0xCF9 5. By writing a magic value to a port/register, as specified by ACPI FACP values RESET_VALUE and RESET_REG
Method 5 ("ACPI reset") has shown to be problematic with one particular BIOS, as the RESET_VALUE and RESET_REG were values 0x06 and 0xCF9 which tried to do a Intel PCI style reset (Method 5) but only worked in 90% of reboots because the reset should be performed in two stages (a write of 0x2, a port delay, and then 0x04) rather than just one write of (0x02 | 0x04).