Gentoo Distfile Mirroring System - Overview
1.
Placing files on the Gentoo Mirror system
The mirror system will automatically fetch any distfile that is in the ebuild tree.
Developers don't have to do anything unless an error occurs. The mirror system is
designed to propogate to all nodes within 4 hours of the files hitting the Master
Private Dist Mirror using cron jobs to pull from the Master. Due to various issues
the nodes may take as long as 24 hours for your file to propogate. If you suspect
that your file is not being fetched simply check the
failure report.
If you're ebuild contains restrict="mirror" the file will not be mirrored. The
only exception to that is mirror://gentoo/. This is automatically done by
the mirror system, no manual intervention is required.
Files placed in dev.gentoo.org:/space/distfiles-whitelist/current will be retained
for six months unless manually deleted earlier. Place any file that you want to be retained
on the mirror system, even if no ebuild refers to it, here. Keep in mind that the mirror
system will retain files for two weeks after it is last referred to in an
ebuild so only use distfiles-whitelist if absolutely necessary.
All entries in
dev.gentoo.org:/space/distfiles-whitelist/current MUST come
with a comment in the same format as profiles/package.mask.
If you wish to whitelist a lot of files, you should create a seperate file in
the same directory instead.
Note:
Placing files in the distfiles-whitelist takes them out of the control of the Mirror
System. If you remove the file the system will automatically take back
control and clean the file like normal. Files are automatically removed from
whitelist after six months.
|
Automatic fetch failure
When the automatic fetch fails it is the responsibility of the package maintainer
to manually retrieve the file from the original location and place it in
/space/distfiles-local on dev.gentoo.org. This file is published
as an rsync directory, to which the private master distfile mirror connects to
and retrieves any files in the directory. These files are synchronized to
/home/distfiles/distfiles-local on the private master distfile mirror. From
there, the /home/distfiles/scripts/distsync.sh runs every 30 minutes to synchronize
/home/distfiles/distfiles-local and /home/distfiles/distfiles on the private
master distfile mirror. Files placed in distfiles-local are automatically removed
after two weeks and the Mirror System takes control of the file.
Warning:
Files placed in distfiles-local will override existing files of the same name that
already exist taking them out of the control of the Mirror
System for the full two weeks that the file resides in distfiles-local.
If you place a file here make sure that it does not already exist or breakage
could occur.
|
The mirror system only downloads the first instance of a file name. If subsequent
ebuilds reference this file name the checksums of the two URI's are compared, if
they do not match the second file will not be fetched. The mirror system will
produce an error and human intervention is required. Please check file names
carefully.
Common fetch errors:
- URI port must be 80, 443, or 23
- URI is malformed (mirrors:// is a common mistake, mirror:// is proper)
- Mirror target isn't valid (doesn't specify a valid tier)
- Checksum conflict with another ebuild in the tree - check your file name
- Upstream host timeout while attempting to connect - Mirror System will reattempt
at next pass
- Upstream host isn't valid - check your URL name.
2.
Technical details and requirements
master private distfile mirror
Source tarballs are automatically fetched and placed on/removed from the mirror
system and an exception report generated by three related scripts: update_distfiles.sh,
mirror-dist.sh and gen-report-xml.py. These scripts run on
osprey.gentoo.org. (all currently maintained by zmedico and ferringb).
The master script is /home/distfiles/scripts/update_distfiles.sh and runs once every
four hours via cron job. The /home/distfiles/scripts/mirror-dist.sh script maintains
a database of the death and purgatory lists. The /home/distfiles/scripts/gen-report-xml.py
script generates an xml file (/home/distfiles/reports/failure.xml) based on
/home/distfiles/logs/failure.log.
The master private distfile mirror needs a distfiles user account. This
account should be configured to run /home/distfiles/distfiles/scripts/update_distfiles.sh
every four hours. Files are placed in /mnt/distfiles/distfiles which is configured
in /etc/rsync/rsyncd.conf to be available as an rsync module. From there,
gentoo.oregonstate.edu runs an hourly cron job that syncs this directory.
gentoo.oregonstate.edu has a password-protected rsync module available, the
information which is only distributed to official Gentoo distfile mirrors. Each
mirror should be synchronizing with this directory once every four hours.
Items Required:
- A distfiles user account on the private master distfile mirror
- The update_distfiles.sh, mirror-dist.sh and
gen-report-xml.py scripts
-
/mnt/distfiles/distfiles configured as an rsync module
- The necessary cron job set up to run the master script, update_distfiles.sh,
every four hours
Step by Step
- update_distfiles calls mirror-dist.sh
- mirror-dist.sh calls ebuild which scans the tree and collects file/digest
pairs.
- If the URI is a mirror, verify the mirror URI. If invalid, fail and write
an error in the fail log.
- If an existing file is found on the mirror system it's checksum is verifed.
If it matches the file is used. If it fails the file is deleted.
- Files that don't exist on the mirror system yet are downloaded from the source
URI's until the file is complete or all source URI's are exhausted.
- Once all files are complete the death-watch database is updated by recursing
the tree and looking for any files that exist on the mirror system but do not
appear in any ebuild.
- Any file that doesn't exist in an ebuild is added to death watch.
- Any file with a death watch date of > two weeks is moved to purgatory.
- Files in purgatory are removed after two weeks.
- Exceptions to the death watch list can be added in /space/distfiles-whitelist
- Files removed from the whitelist are deleted from the mirror system as normal.
- Dump stats
-
update_distfiles.sh calls gen-report-xml.py
-
gen-report-xml.py creates a report from the stats.
- The report is copied to http://dev.gentoo.org/~zmedico/infra/distfiles/failure.xml
via cronjob.
dev.gentoo.org
-
/space/distfiles-local configured as an rsync module on dev.gentoo.org
- An rsync command to synchronize dev.gentoo.org::distfiles-local
with /home/distfiles/distfiles-local on the private master distfile mirror.
- The distsync.sh script
- The necessary cron jobs set up to run the above scripts and commands at the
right times.
|