Gentoo Linux rsync Mirrors Policy and Guide
1.
Preliminaries
Terms, names and all that
This guide is intended for people who would like to set up an rsync mirror of
their own. It caters not only to those who want to run an official rsync mirror
but also those wanting to run private mirrors.
There are three kinds of Gentoo rsync mirrors: main rotation mirrors, community
mirrors and private mirrors. Main rotation mirrors are maintained by the Gentoo
infrastructure team. They handle the bulk of the Gentoo rsync traffic. The
community mirrors are run by volunteers from the Gentoo community. Private
mirrors are mirrors run by individuals which are closed off to the public and
meant to cut traffic costs and latency for an organization or individual.
At this time, we have enough community mirrors and are actively seeking
additional main rotation mirrors. Hardware specifications for main rotation
servers include:
- Minimum of a 2GHz Pentium 4 processor (64-bit with at least 2 cores preferred)
- Minimum of 2GB RAM (3GB - 4GB is ideal)
- 15GB of disk space (IDE/SATA is fine)
You would maintain physical possession ownership of the hardware, and keep it
online in your own colocation space. Average bandwidth consumption for each
main rotation mirror is currently ~10Mbit/sec (around 2.6 TiB per month). As
the number of main rotation mirrors increases, this number should decrease
accordingly.
If you would like to donate your machine, please email the Mirror Admins with the pertinent
information.
2.
Setting up your own local rsync mirror
Introduction
Many users run Gentoo on several machines and need to sync the portage trees on
all of them. Using public mirrors is simply a waste of bandwidth at both ends.
Syncing only one machine against a public mirror and all others against that
computer would save resources on Gentoo mirrors and save users' bandwidth.
The same holds true for organizations who would like to control the rsync
mirror their servers and workstations sync against. Of course, they usually
also want to save on bandwidth and traffic costs.
All you need to do is select which machine is going to be your own local rsync
mirror and set it up. You should choose a computer that can handle the CPU and
disk load that an rsync operation requires. Your local mirror also needs to be
available whenever any of your other computers syncs its portage tree.
Besides, it should have a static IP address or a name that always resolves to
your server. Configuring a DHCP and/or a DNS server is beyond the scope of this
guide.
Note that these instructions assume your private rsync mirror is a Gentoo
machine. If you intend to run it on a different distribution, the guide for
setting up a community mirror might be more helpful. Just don't sync the mirror
every half hour but once or twice a day.
Setting up the server
There is no extra package to install as the required software is already on
your computer. Setting up your own local rsync mirror is just a matter of
configuring the rsyncd daemon to make your /usr/portage
directory available for syncing. Create the following
/etc/rsyncd.conf configuration file:
Code Listing 2.1: Sample /etc/rsyncd.conf |
pid file = /var/run/rsyncd.pid
max connections = 5
use chroot = yes
uid = nobody
gid = nobody
hosts allow = 192.168.0.1 192.168.0.2 192.168.1.0/24
hosts deny = *
[gentoo-portage]
path=/usr/portage
comment=Gentoo Portage
exclude=distfiles/ packages/
|
You do not need to use the hosts allow and hosts deny options.
By default, all clients will be allowed to connect. The order in which you
write the options is not relevant. The server will always check the hosts
allow option first and grant the connection if the connecting host matches
any of the listed patterns. The server will then check the hosts deny
option and refuse the connection if any match is found. Any host that does not
match anything will be granted a connection. Please read the man page (man
rsyncd.conf) for more information.
Now, start your rsync daemon with the following command as the root user:
Code Listing 2.2: Starting the rsync daemon |
# /etc/init.d/rsyncd start
# rc-update add rsyncd default
|
Let's test your rsync mirror. You do not need to try from another machine but
it would be a good idea to do so. If your server is not known by name from all
your computers, you can use its IP address instead.
Code Listing 2.3: Testing your mirror |
# rsync 192.168.0.1::
gentoo-portage Gentoo Portage
# rsync your_server_name::gentoo-portage
|
Your rsync mirror is now set up. Keep running emerge --sync as you have
done so far to keep your server up-to-date. If you use cron or similar
facilities to sync regularly, remember to keep it down to a sensible frequency
like once or twice a day.
Note:
Please note that most public mirror administrators consider syncing more than
once or twice a day an abuse. Some if not most of them will ban your IP from
their server if you start abusing their machines.
|
Configuring your clients
Now, make your other computers use your own local rsync mirror instead of a
public one. Edit your /etc/portage/make.conf and make the SYNC
variable point to your server.
Code Listing 2.4: Define SYNC in /etc/portage/make.conf |
SYNC="rsync://192.168.0.1/gentoo-portage"
SYNC="rsync://your_server_name/gentoo-portage"
|
You can check that your computer has been properly set up by syncing against
your own local mirror for the first time:
Code Listing 2.5: Checking and syncing |
# emerge --info|grep SYNC
SYNC="rsync://your_server_name/gentoo-portage"
# emerge --sync
|
That's it! All your computers will now use your local rsync mirror whenever you
run emerge --sync.
3.
Setting up a community rsync server
Introduction
Note:
You can find sample configuration and script files in the gentoo-rsync-mirror
package. Just do emerge gentoo-rsync-mirror.
|
Right now, mirroring our Portage tree requires around 600Mb, so it isn't space
intensive; having at least 1Gb free should allow for growing room. Setting up a
Portage tree mirror is simple -- first, ensure that your mirror has rsync
installed. Then, set up your rsyncd.conf file to look something
like this:
Code Listing 3.1: rsyncd.conf |
uid = nobody
gid = nobody
use chroot = yes
max connections = 15
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
log file = /var/log/rsync.log
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300
[gentoo-portage]
#modern versions of portage use this entry
path = /gentoo/rsync
comment = Gentoo Linux Portage tree mirror
exclude = distfiles
|
You can pick your own locations for most of the files, of course. What's
important is the section name ([gentoo-portage]). This is the location
that rsync clients will try to sync from.
For security reasons, the use of a chrooted environment is required! This has
implications for the logged timestamps -- see the FAQ below.
Now, you need to mirror the Gentoo Linux Portage tree. You can use the script
below to do so. Again, you'll probably want to change some of the file
locations to suit your needs -- in particular, they should match those of your
rsyncd.conf.
Code Listing 3.2: rsync-gentoo-portage.sh |
#!/bin/bash
RSYNC="/usr/bin/rsync"
OPTS="--quiet --recursive --links --perms --times -D --delete --timeout=300"
#Uncomment the following line only if you have been granted access to masterportage.gentoo.org
#SRC="rsync://masterportage.gentoo.org/gentoo-portage"
#If you are waiting for access to our master mirror, select one of our mirrors to mirror from:
SRC="rsync://rsync.de.gentoo.org/gentoo-portage"
DST="/space/gentoo/rsync/"
echo "Started update at" `date` >> $0.log 2>&1
logger -t rsync "re-rsyncing the gentoo-portage tree"
${RSYNC} ${OPTS} ${SRC} ${DST} >> $0.log 2>&1
echo "End: "`date` >> $0.log 2>&1
|
Your rsyncd.motd should contain your IP address and other relevant
information about your mirror, such as information about the host providing the
Portage mirror and an administrative contact. You can now test your server as
outlined in the Setting up your own local rsync
mirror chapter above.
After you have been approved as an official rsync mirror, your host will be
aliased with a name of the form: rsync[num].[country
code].gentoo.org.
4.
Short FAQ
Q: Who should I contact regarding rsync issues and maintenance?
A: Visit Gentoo Bugzilla and fill out
a bug on the product "Mirrors", component "Server Problem".
Q: How can I check the freshness of an official rsync server?
The Gentoo infrastructure team monitors all community rsync servers for
freshness. You can see the results on the corresponding web page.
Q: I run a private rsync mirror for my company. Can I still access
masterportage.gentoo.org?
A: Because our resources are limited, we need to ensure we allocate them in
such a way as to provide the maximum amount of benefit to our users. As such,
we limit connections to our master rsync and distfile mirrors to public mirrors
only. Users are welcome to use our regular mirror system to establish a private
rsync mirror, though they are asked to follow certain basic rsync etiquette
guidelines.
Q: Is it important that I sync my community rsync mirror twice an hour?
A: Yes it is important. You do not need to perform the syncs at exactly :00 and
:30 but the syncs should take place in each of the following two windows:
- :00 to :10
- :30 to :40
Additionally, please make sure that your syncs are exactly 30 minutes apart.
So, if you schedule the first sync of each hour for :08, please schedule the
second sync of the hour for :38.
Q: Where should I sync my rsync mirror before I become an official Gentoo mirror?
- For European-based rsync mirror: sync to rsync.de.gentoo.org
- For US-based rsync mirror: sync to rsync.us.gentoo.org
- For all others: sync to rsync.us.gentoo.org
Q: How do I find the mirror nearest to me?
A: netselect was designed to do this for you. If you haven't already run
emerge netselect then do it. Then run: netselect
rsync.gentoo.org. After a minute or so netselect will print an IP address.
Take this address and use it as the only parameter for rsync with two colons
appended to it. e.g.: rsync 1.2.3.4::. You should be able to find out
which mirror that is from the banner message. Update your
/etc/portage/make.conf accordingly.
Q: Can I use compression when syncing against masterportage.gentoo.org?
A: No. Compression utilizes too many resources on the server, so we have
forcibly disabled it on masterportage.gentoo.org. Please do not
attempt to use compression when syncing against this server.
Q: I'm seeing a lot of old and probably dead rsync processes, how can I
get rid of them?
This command will help you to kill old rsync processes that sometimes lie
around due to connection problems. It's important to kill those because they
count as valid connections for the 'max connections' option. You may run this
command via crontab every hour, it will search and kill rsync processes older
than one hour.
Code Listing 4.1: Kill old rsync processes |
/bin/kill -9 `/bin/ps --no-headers -Crsync -o etime,user,pid,command|/bin/grep nobody | \
/bin/grep "[0-9]\{2\}:[0-9]\{2\}:" |/bin/awk '{print $3}'`
|
Q: There are many users who connect to my rsync server very frequently,
sometimes even causing a DoS to my mirror, is there any way to prevent
this?
In some cases, there are a few inconsiderate users who abuse the rsync mirror
system by syncing more than 1-2 times per day. In the most extreme cases,
users schedule cron jobs to sync every 15 minutes or so. This often leads to a
Denial of Service attack by continually occupying an rsync slot that could have
otherwise gone to another user. To try and prevent this, you may use the this perl
script which will scan your rsync log files, pick out IP addresses that
have already connected more than N times that day and dynamically create
a rsyncd.conf file, including the offending IP addresses in the
'hosts deny' directive. The following line controls what N equals (in
this case 4):
Code Listing 4.2: Define maximum number of connections per IP |
@badhosts=grep {$hash{$_}>4} keys %hash;
|
If you use this script, please remember to rotate your rsync log files daily
and modify the script to match the location of your rsyncd.conf
file. This script is tested on Gentoo Linux, but should work suitably on other
arches that support both rsync and perl.
The contents of this document, unless otherwise expressly stated, are licensed under the CC-BY-SA-2.5 license. The Gentoo Name and Logo Usage Guidelines apply.
|