Operational
This page is for Importer admins to document how to perform various tasks with the installer in production.
Overview
The importer is a few separate processes:
- mass-import: the controller daemon that dispatches jobs to workers.
- import_package.py: the worker that acts on a single package.
- add_import_jobs.py: run from cron to add jobs for new uploads.
- list_packages.py: run infrequently from cron to keep our list of all packages up to date for the idle tasks.
and a few helper scripts:
- requeue_package.py: retry failed packages, change priority of jobs, set spurious failure types.
- show_failure.py: command line way to view the current failure for a package if any.
- categorise_failures.py: run from cron to update the status page every 5 minutes. Can be run by hand if desired.
Architecture
mass-import runs all the time to do the dispatching, for which it starts up new processes to get a process boundary to catch all types of failures (it can notice a python segfault). It is looping and when there is a free slot it either starts the highest priority job, or runs an idle task, or sleeps. import_package.py then works out what it has to do on a package, and performs anything it has to do, if anything.
mass-import uses a lock to ensure it is single instance, and import_package.py uses locks to ensure only one is running at any moment for any package.
They are designed to recover from complete failure, so it is safe to kill them at any time. When restarted mass-import will start jobs for any packages it was part-way through when killed. Every time import_package.py is run for a package it checks for half-finished imports on the local disk, and either blows them away, or continues them, depending on what stage was reached. It uses a three-phase commit procedure to allow this.
The only problem with this comes when e.g. bzr makes half a .bzr on LP and is then killed. In those cases some manual intervention will be required when the package later fails with an error pointing to that.
There is a polite way to stop the processes though, see below.
Getting in
The importer runs on jubany, under the pkg_import user (sudo su - pkg_import).
It is rooted in /srv/package-import.canonical.com/new/, with the scripts imaginatively placed in the "scripts" directory.
Tasks
Starting and stopping
- To start run ./mass_import.init start
- To gracefully stop run ./stop_all.sh and wait until the mass_import.py process exits.
- To kill run ./mass_import.init stop
- To restart (e.g. after an update to mass_import.py) ./mass_import.init restart
Retrying failures
To retry failures ./requeue_package.py <package names>
- To retry with high priority add --priority
- To retry and automatically retry subsequent failures with the same trace (don't use as a workaround for retrying all with the same trace if it is not a transient error) add --auto.
- - Note that --priority isn't remembered with --auto.
Bumping priority
To bump the priority of something waiting in the queue ./requeue_package.py --force --priority <package names>
Catching missed uploads
To cause an import of a package that hasn't failed and isn't in the queue ./requeue_package.py --force <package name>
Updating the code
- Run "bzr pull" in the scripts directory to update from the LP branch. Remember to restart mass-import if you modified its code. Changes to other scripts will be picked up without intervention.