IdentifyingIssues

Revision 2 as of 2012-01-16 17:09:05

Clear message

To minimise power consumpution applications need to be behaving correctly. Unfortunately some applications are far from perfect and cause unnecessary work for the system by causing frequent CPU wakeups or disk I/O activity.

Wakeups

Applications and daemons generally are not running 100% of the time and can be in an idle state waiting for I/O to complete. Many block (wait) for I/O to complete using poll() or select() and these allow the application to also timeout after a specified amount of time. The application may use the timeout to update some status or just re-do the poll() or select(). Badly behaving applications may wait for a very short time, causing multiple wakeups a second which impacts on the ability of the CPU to stay in a deep idle state. Some applications may even have a zero timeout, which causes a busy loop and never lets the CPU enter an idle state.

Identifying rogue processes is a two step process. Firstly, we need to identify tasks that are frequently waking up the system. Tools such as powertop help identify rogue tasks, for Ubuntu 12.04 Precise LTS we recommend using the older version of powertop, install it as follows:

sudo apt-get install powertop-1.13

and run for 60 seconds as follows:

sudo powertop-1.13 -d -t 60 > powertop.log

Looking at the powertop.log you will see a section titled "Top causes for wakeups:" and this will list running processes in order of wakeups per second. An excerpt from this is a follows:

   1.4% ( 10.0)   gwibber-service
   1.4% (  9.9)   ubuntuone-syncd
   0.8% (  5.5)   [ata_piix] <interrupt>
   0.7% (  5.0)   syndaemon

Lets examine ubuntuone-syncd. It appears to be waking up 9.9 times a second, so what is it doing? First we identify its process ID:

ps -e | grep ubuntuone-syncd
2331 ?        00:01:03 ubuntuone-syncd

..and now we then attach strace to that PID as follows:

sudo strace -p 2331
Process 2331 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
read(10, 0x7fffe58ab1e0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
read(22, 0x285a2c4, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=22, events=POLLIN}, {fd=64, events=POLLIN}, {fd=16, events=POLLIN}, {fd=3, events=POLLIN}, {fd=62, events=POLLIN}, {fd=63, events=POLLIN}, {fd=71, events=POLLIN}], 9, 101) = 0 (Timeout)
read(10, 0x7fffe58ab1e0, 16)            = -1 EAGAIN (Resource temporarily unavailable)
read(22, 0x285a2c4, 4096)               = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=22, events=POLLIN}, {fd=64, events=POLLIN}, {fd=16, events=POLLIN}, {fd=3, events=POLLIN}, {fd=62, events=POLLIN}, {fd=63, events=POLLIN}, {fd=71, events=POLLIN}], 9, 101) = 0 (Timeout)

..hit control-C to stop strace.

The final argument to poll() [see man 2 poll] is the timeout (in milliseconds, and a negative value is in infinite timeout). In this case poll waits for 101 milliseconds and then returns 0, which indicates a timeout occurred and none of the file descriptors were ready for I/O. If one observes this process with strace for a while one notices that the poll is frequently timingout on the 101 millisecond wait, so this is effectively a 9.9Hz polling loop which needs fixing (see bug LP#906924).

Applications may also using select() in the same way, with the final argument being being the address of a struct timeval and a return of zero from select() indicating a timeout. See man 2 select for more details.

Disk Wakeups

Badly behaving applications may also be frequently waking up the disk drive and forcing data to be sync'd to disk using fsync() or sync(). For example, debug logging may be still turned on and frequently writing debug messages. Applications may have unnecessarily opened files with O_SYNC which depedening on the filesystem causes file data and metadata to be written to disk before returning from a writes, this causes disk wakeups and writes where instead data could be held in the buffer cache.

The easiest problem to identify is frequent disk I/O from seemingly idle applications. Simply run powertop-1.13 for 60 seconds as follows:

sudo powertop-1.13 -d -t 60 > powertop.log

Looking at the powertop.log you will see a section titled "Disc accesses:" and this will list processes that were writing to disk and the name of the file and device it was writing to. An excerpt from this is a follows:

Disk accesses:
The application 'thunderbird-bin' is writing to file 'global-messages-db.sqlite-journ' on /dev/ecryptfs
The application 'thunderbird-bin' is writing to file 'global-messages-db.sqlite' on /dev/ecryptfs
The application 'thunderbird-bin' is writing to file 'INBOX.msf' on /dev/ecryptfs
The application 'thunderbird-bin' is writing to file 'ECRYPTFS_FNEK_ENCRYPTED.FWb7w4.' on /dev/sda5
The application 'thunderbird-bin' is writing to file 'ECRYPTFS_FNEK_ENCRYPTED.FWb7w4.' on /dev/sda5
The application 'thunderbird-bin' is writing to file 'panacea.dat' on /dev/ecryptfs
The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5
The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5
The application 'indicator-weath' is writing to file 'indicator-weather.log' on /dev/sda5
The application 'thunderbird-bin' is writing to file 'panacea.dat' on /dev/ecryptfs
The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5
The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5
The application 'dconf-service' is writing to file 'user.L2II8V' on /dev/sda5

Let us explore the 'indicator-weath' writes. The name is truncated because powertop fetches the name from the /proc/$pid/comm field. We need to identify the process ID and the full pathname to the file as follows:

ps -e | grep indicator-weath
 2024 ?        00:00:04 indicator-weath

lsof -p 2024 | grep indicator-weather.log
indicator 2024 ubuntu    3u   REG                8,5   154647 11796492 /home/ubuntu/.cache/indicator-weather.log

And since it is a log, we can see any recent activity:

tail -10 /home/king/.cache/indicator-weather.log
[Fetcher] 2012-01-16 16:57:52,805 - DEBUG - Weather: night, show 'weather-clear-night' icon
[Fetcher] 2012-01-16 16:57:52,808 - DEBUG - Indicator: fill in menu with params: city='Gatwick', temp='Temperature: 3 ˚C', humid='Humidity: 65%', wind='Wind: E at 8 mph', sunrise='Sunrise: 07:59:41', sunset='Sunset: 16:22:13', puretemp=3 ˚C
[Fetcher] 2012-01-16 16:57:52,808 - DEBUG - Indicator: menu_normal: filling in a menu for found places
[Fetcher] 2012-01-16 16:57:52,816 - DEBUG - Indicator: update_label: setting label to ' '
[Fetcher] 2012-01-16 16:57:52,821 - DEBUG - Indicator: setting refresh label to 'Refresh (just now)'
[Fetcher] 2012-01-16 16:57:52,823 - DEBUG - Indicator: scheduling refresh label update in 1 min
[Fetcher] 2012-01-16 16:57:52,833 - DEBUG - Weather: night, show 'weather-clear-night' icon
[Fetcher] 2012-01-16 16:57:52,835 - DEBUG - Settings: setting '23387567'='{'humidity': u'Humidity: 65%', 'sunrise': 'Sunrise: 07:59:41', 'sunset': 'Sunset: 16:22:13', 'wind': u'Wind: E at 8 mph', 'humidex': None, 'icon': 'weather-clear-night', 'temper': u'Temperature: 3 \u02daC', 'condition': u'Clear', 'label': '3 \xcb\x9aC'}'
[Fetcher] 2012-01-16 16:57:52,836 - DEBUG - Indicator: update_label: setting label to '3 ˚C'
[Fetcher] 2012-01-16 16:57:52,837 - DEBUG - Indicator: scheduling update in 15 mins

Looking at the log more deeply we can see that the applet is writing DEBUG messages to the log every minute which is keeping the disk drive from being in an idle state, so this needs fixing (e.g. LP#917253).

Filing Bugs

If you identify rogue applications or daemons that are frequently looping on poll()/select() or needlessly doing frequent writes to disk then file a bug against the application and also add "The Ubuntu Power Consumption Project" using "Also affects project".