====== Linux VServer Project ====== Linux-VServer provides virtualization for GNU/Linux systems. This is accomplished by kernel level isolation. It allows to run multiple virtual units at once. Those units are sufficiently isolated to guarantee the required security, but utilize available resources efficiently, as they run on the same kernel. ===== Resources ===== * [[http://linux-vserver.org/|Project homepage]] * [[http://linux-vserver.org/Overview|Vserver Overview]] * [[http://www.nongnu.org/util-vserver/doc/conf/configuration.html|doc of configuration items]]. you should change the stylesheet in your browser if you care for your eyes ;-) * [[http://www.solucorp.qc.ca/miscprj/s_context.hc?prjstate=1|Introduction to vserver]] an older document (for < 2.x), but still useful * [[http://www.gentoo.org/doc/en/vserver-howto.xml|Gentoo Linux-VServer Howto]] you'll need this if you want to setup Gentoo guest vserver ===== Installing Vserver host on PLD Linux ===== Ensure you have appropriate [[packages:kernel]] installed. You can check this from kernel config: # modprobe configs # zgrep CONFIG_VSERVER /proc/config.gz CONFIG_VSERVER=y ===== Installing guest PLD Linux Vserver ===== ==== Preparing userspace tools ==== First, install the tools: # poldek -u util-vserver If you need to review poldek repo sources, then the configs are in ''/etc/vservers/.distributions/pld-*/poldek/'' where ''*'' can be ''ac'' or ''th'' depending which guest you wish to install. At this point you should have booted into vserver-enabled kernel. You must start ''vprocunhide'' or none of your Vservers can start. To start ''vprocunhide'': # /sbin/service vprocunhide start ==== Guest creation ==== Build the guest system. # a guest name (not hostname) NAME=test # must be a number within 2-32767 range. CTX=2 vserver $NAME build --context $CTX -m poldek -n $NAME This defaults installing guest same ARCH and VERSION that your host is. If you need to use another combination, then there are two versions of PLD available for guest systems: * pld-ac - [[:ac|PLD 2.0 (Ac)]] * pld-th - [[:th|PLD 3.0 (Th)]] You may choose one using ''-d'' option: DIST=pld-th vserver $NAME build --context $CTX -m poldek -n $NAME -- -d $DIST using ''util-vserver >= 0.30.214-2'' from ac-updates, ''util-vserver >= 0.30.215-2'' from from th you can build other arch or distro or using own mirror: MIRROR=http://ftp.pld-linux.org/dists/ac vserver $NAME build --context $CTX -m poldek -n $NAME -- -m $MIRROR To build 32bit guest on 64bit host: vserver $NAME build --context $CTX -m poldek -n $NAME --personality linux_32bit --machine i686 -- -d $DIST To build vserver from template (archive containing whole filesystem): # vserver $NAME build --context $CTX -m template -n $NAME -- -t image.tar.bz2 To see other ''build'' command options: # vserver test build --help Install ''rc-scripts'' to the new system using ''vpoldek'': # vpoldek test -- -u rc-scripts you should consider installing ''vserver-packages'' rpm package to satisfy packages dependency which have no use inside vserver. And then start the guest system: # vserver test start To enter that vserver, type: # vserver test enter Note, however, that if you don't run //plain// init style you must have at least one daemon running inside your guest vserver or it will be shut down shortly. ===== Configuring the network ===== ''/etc/vservers//interfaces/'' 'iface' is an arbitrary name for the interface; the value itself is not important but may be interesting regarding interface-creation and usage with chbind. Both happens in alphabetical order and numbers like '00' are good names for these directories. * ''bcast'' The broadcast address. * ''dev'' The network device. * ''disabled'' When this file exists, this interface will be ignored. * ''ip'' The ip which will be assigned to this interface. * ''mask'' The network mask. * ''name'' When this file exists, the interface will be named with the text in this file. Without such an entry, the IP will not be shown by ifconfig but by ip addr ls only. Such a labeled interface is known as an "alias" also (e.g. 'eth0:foo'). * ''nodev'' When this file exists, the interface will be assumed to exist already. This can be used to assign primary interfaces which are created by the host or another vserver. Using this means that IP address won't be removed at vserver stop. * ''prefix'' The network prefix-length. * ''scope'' The scope of the network interface. To add interface with address 192.168.0.1/24, type: # mkdir /etc/vservers//interfaces/0 # echo eth0 > /etc/vservers//interfaces/0/dev # echo 192.168.0.1/24 > /etc/vservers//interfaces/0/ip ===== Configuring resources ===== * [[http://linux-vserver.org/Resource+Limits|http://linux-vserver.org/Resource+Limits]] ''/etc/vservers/vserver-name/rlimits'' A directory with resource limits. Possible resources are ''cpu'', ''fsize'', ''data'', ''stack'', ''core'', ''rss'', ''nproc'', ''nofile'', ''memlock'', as and locks. This configuration will be honored for kernel 2.6 only. * ''resource'' A file which contains the hard- and soft-limit of the given resource in the first line. The special keyword 'inf' is recognized. * ''resource.hard'' A file which contains the hard- of the given resource in the first line. The special keyword 'inf' is recognized. * ''resource.min'' A file which contains the guaranteed minimum of the given resource in the first line. The special keyword 'inf' is recognized. * ''resource.soft'' A file which contains the soft- of the given resource in the first line. The special keyword 'inf' is recognized. ===== Managing packages ===== You should decide for either package management policy: Benefits managing packages //externally//: * provides extra security * avoids duplicating RPM database and installed libraries/packages Benefits of managing packages //internally//: * vserver is more standalone due no dependency on host vserver (rpm version or libraries) and moving such vserver to other host is therefore easier. Things you should be aware with //internal// package management: * you cannot upgrade rpm packages when vserver is down (obviously). * you must have network configured for guest os to use poldek network functions (''/etc/resolv.conf'', ''interfaces/N/IP'', etc) ==== External package management ==== === Using vpoldek === Syntax: ''vpoldek -- [REGULAR POLDEK OPTIONS]'' For example: # vpoldek test -- -u squid === Using vrpm === Syntax: ''vrpm -- [REGULAR RPM OPTIONS]'' For example: # vrpm test -- -qa 'apache-*' ==== Internal package management ==== To be able to use ''poldek'' and ''rpm'' from inside of your vserver, you will have to switch from managed to stand alone package management: # vpoldek test -- -u poldek # vserver test stop # vserver test pkgmgmt internalize From now on, the packages are managed by the vserver itself and the host system's tools should no longer be used to install or remove any packages. See this doc for further info: $ less /usr/share/doc/util-vserver-build-0.30.210/package-management.txt.gz ==== DB version mismatch with host/guest ==== if you installed ''ac'' vserver under ''th'' and have internalized package management, you'll likely suffer db version mismatch errors: rpmdb: Program version 4.5 doesn't match environment version 4.7 rpmdb: /var/lib/rpm/Packages: unsupported hash version: 9 To solve this, we need to dump rpmdb and restore it. on ''th'' host, dump the db and install tools for guest: # poldek -u db4.7-utils # cd /vservers/test/var/lib/rpm # rm -f __db.00* # vpoldek test -- -u db4.5-utils # db_dump Packages > Packages.dump on ''ac'' guest load the db: # vserver test enter # cd /var/lib/rpm # rm -f __db.00* Packages Pubkeys # db_load Packages < Packages.dump # rpm --rebuilddb # rm -f Packages.dump ===== Using plain init style ===== You might want to run your vserver with init style set to //plain//, which means it runs like a regular Linux host, where everything is controlled by ''/sbin/init''. The other reason for doing so is that it might happen that you can't enter your vserver because it gets shut down before you can enter it because of no running processes. To enable //plain// init style: # echo 'plain' > /etc/vservers/test/apps/init/style ===== Copying guest PLD Linux Vserver to another host ===== Stop the vserver first # vserver test stop Then just archive and copy the structure: # tar --exclude '/vservers/test/var/lib/mysql/*' -cSf /www/vs-test.tar \ /{etc/vservers,vservers,vservers/.pkg}/test ===== Removing guest PLD Linux Vserver ===== Stop the vserver first # vserver test stop Remove vserver config, filesystem and in case of external package management the rpmdb dir: # rm -rf /{etc/vservers,vservers,vservers/.pkg}/test Recent util-vserver includes a new command called delete: # vserver test delete Are you sure you want to delete the vserver test (y/N) y Resource Manager: Entering runlevel number............................[ 6 ] Stopping OpenSSH service...........................................[ DONE ] Saving random seed.................................................[ DONE ] Please stand by while rebooting the vserver........................[ DONE ] ===== Common problems / Useful tricks ===== ==== Starting vserver fails with Dynamic Context error ==== # vserver test start Dynamic Context IDs are not supported, you must set Context ID in /etc/vservers/test/context file Fix: set Context ID number in /etc/vservers/test/context file # echo >/etc/vservers/test/context must be a number within 2-32767 range. Rationale: Dynamic allocation of context IDs has been disabled in latest utils, due to it being deprecated and discouraged by the Linux Vserver authors. ==== Starting vservers issues warnings about vc_net_create() ==== # vserver test start chbind: vc_net_create(): Invalid argument This warning is issued when there are no network interfaces configured within given vserver. You may want to configure one (see section: //Configuring the network//). If you need no network interfaces - e.g. when you plan not to run any daemons inside vserver - you may ignore this warning. ==== Starting service emits ulimit error ==== /etc/init.d/lighttpd: ulimit: exceeds allowable limit Fix: remove //-u unlimited// from //DEFAULT_SERVICE_LIMITS// in ///etc/sysconfig/system// or per service config. ==== Provides: user(name) and group(name) do not work ==== If some group is provided by multiple packages and one is deinstalled, the users will be removed. This is because the rpm binary is not available with external package management for rpm scripts. Preparing... ########################################### [100%] 1:test ########################################### [100%] + rpm -qa /var/tmp/rpm-tmp.17082[3]: rpm: not found error: %post(test-0.1-1.11.i686) scriptlet failed, exit status 127 vpoldek failed on vserver 'test' with errorcode 1 Workaround: disable //RPM_USERDEL=yes// from ///etc/sysconfig/rpm// ==== Service ssh don't start inside guest server ==== test sshd[17644]: error: Bind to port 22 on 192.168.0.1 failed: Cannot assign requested address. Fix: set separate addresses after //Listen''''Address// in ///etc/ssh/sshd_config// both on host and guest system. Guest configuration is optional as it's limited to chbind addresses and if these are not taken by the SSH daemon running on host system everything will work just fine. ==== bind won't install because of a mknod problem ==== bind requires some special device nodes inside it's chroot jail located in ///var/lib/named//. Vserver security does not allow device node creation so you will have to install the package specifying ''--excludepath=/var/lib/named/dev'' and then create devices ''/dev/null'' and ''/dev/random'' from outside of the vserver context. UPDATE: vpoldek doesn't allow the ''--excludepath'' option: poldek: unrecognized option `--excludepath=/var/lib/named/dev' An alternative method is to write in poldek.conf: rpmdef = _netsharedpath /dev:/var/lib/named/dev or in ''/vservers/test/etc/rpm/macros'': %_netsharedpath /dev:/var/lib/named/dev To run bind you will have to change one more thing. PLD version of bind uses chroot for extra security and vserver security removes all special kernel capabilities. To allow chrooting inside your DNS vserver, use the following: # echo CAP_SYS_RESOURCE >> /etc/vservers/test/bcapabilities [[http://www.solucorp.qc.ca/howto.hc?projet=vserver&id=72|http://www.solucorp.qc.ca/howto.hc?projet=vserver&id=72]] You can use //lcap// program to see available capabilities: # lcap Current capabilities: 0xFFFFFEFF 0) *CAP_CHOWN 1) *CAP_DAC_OVERRIDE 2) *CAP_DAC_READ_SEARCH 3) *CAP_FOWNER 4) *CAP_FSETID 5) *CAP_KILL 6) *CAP_SETGID 7) *CAP_SETUID 8) CAP_SETPCAP 9) *CAP_LINUX_IMMUTABLE 10) *CAP_NET_BIND_SERVICE 11) *CAP_NET_BROADCAST 12) *CAP_NET_ADMIN 13) *CAP_NET_RAW 14) *CAP_IPC_LOCK 15) *CAP_IPC_OWNER 16) *CAP_SYS_MODULE 17) *CAP_SYS_RAWIO 18) *CAP_SYS_CHROOT 19) *CAP_SYS_PTRACE 20) *CAP_SYS_PACCT 21) *CAP_SYS_ADMIN 22) *CAP_SYS_BOOT 23) *CAP_SYS_NICE 24) *CAP_SYS_RESOURCE 25) *CAP_SYS_TIME 26) *CAP_SYS_TTY_CONFIG * = Capabilities currently allowed ==== syslog-ng won't run ==== There is no access to klogd inside vservers so all you have to do is change the following line in the config file: source src { pipe ("/proc/kmsg" log_prefix("kernel: ")); unix-stream("/dev/log"); internal(); }; Into: source src { unix-stream("/dev/log"); internal(); }; ==== Running openvpn inside vserver ==== You need to: * create ///dev/net/tun//: # mkdir -p /vservers/test/dev/net # mknod -m 660 /vservers/test/dev/net/tun c 10 200 * ~hide_netif # echo '~hide_netif' >> /etc/vservers/test/flags * grant CAP_NET_ADMIN # echo CAP_NET_ADMIN >> /etc/vservers/test/bcapabilities ==== Can't use ssh xauth forwarding ==== workaround: disable ''X11UseLocalhost'' in ''sshd_config'' ==== Mount failed for selinuxfs on /selinux: Operation not permitted ==== When starting guest with init style being set to plain with newer libselinux you can see error message like this. It happens because init executes function from libselinux which tries to mount /selinux. Disable selinux for guest by doing: echo "SELINUX_INIT=no" > /etc/vservers//apps/init/environment or in .defaults (to disable for all guests). ==== Not enough space on /tmp ==== Just after installation in each vserver 16MB RAM-based filesystem is mounted in /tmp. If you want your /tmp filesystem to be bigger, reside on diffrent device or not be mounted at all see ''/etc/vservers/test/fstab''. ==== Disabling interface ==== It's very convenient to disable some interface so it won't be activated on vserver boot # touch /etc/vservers/test/interfaces/0/disabled ==== Display mounts of each xid (vserver) ==== for a in /proc/virtual/[0-9]*; do \ xid=$(basename $a /); \ echo "xid: $xid"; \ vnamespace -e $xid -- cat /proc/mounts | sed -e "s,^, $xid: ,"; \ done And similarly to unmount ''/opt/storage'' on all running vservers: for a in /proc/virtual/[0-9]*; do \ xid=$(basename $a /); \ echo "xid: $xid"; \ vnamespace -e $xid -- umount /opt/storage; \ done The last sample is needed if you want to umount /opt/storage completely on host, but as vservers inherit mounts at startup (even they don't use them) you can't umount /opt/storage. ==== squid won't start: FATAL: setrlimit: RLIMIT_NOFILE: (1) Operation not permitted ==== # echo CAP_SYS_RESOURCE >> /etc/vservers/test/bcapabilities ==== Making vserver automatically startup on host boot ==== Install ''util-vserver-init'' package, read and edit ''/etc/sysconfig/vservers''. ==== Vservers startup order ==== Sometimes it may happen that you need to be sure that one of the vservers is started before the others - e.g. it provides some service that other depend on. Vserver provides an easy way to do this - let's assume that //test2// vserver depends on //test// and //foo// vservers: # echo test >> /etc/vservers/test2/apps/init/depends # echo foo >> /etc/vservers/test2/apps/init/depends At shutdown, the //test2// vserver will be stopped before its dependencies. ==== Logging vserver start/stop messages using syslog-ng ==== It is possible to log system startup/shutdown messages for guest systems on host system. For each guest that you wish to log please do: mkfifo /vservers//dev/console If you wish to log each guest to separate log file add following entries to your ''/etc/syslog-ng/syslog-ng.conf'' # define new log source for each guest source vserver_name { pipe ("/vservers/name/dev/console"); }; # define destination for each guest destination vserver_name { file("/var/log/vserver_name.log"); }; # log each vserver guest log { source(vserver_name); destination(vserver_name); }; It is also possible to log all guests to single log file and just prefix log entries with guest name. # define log source for guests, prefix each one with guest name source vservers { pipe ("/vservers/test1/dev/console" log_prefix("test1: ")); pipe ("/vservers/test2/dev/console" log_prefix("test2: ")); pipe ("/vservers/test3/dev/console" log_prefix("test3: ")); }; # define destination for vservers log destination vservers { file("/var/log/vservers"); }; # log vserver guest start/stop messages log { source(vservers); destination(vservers); }; ==== Vserver guest on physical console ==== If you wish to have your guest vserver available on physical console, lets say, ''/dev/tty2'' do following: * comment out tty2 in ''/etc/inittab'' on host machine #2:2345:respawn:/sbin/mingetty tty2 * copy /dev/tty2 from host machine as /vservers/name/dev/tty2 * comment out all ttys in /vservers/etc/inittab except tty2, it is good idea to comment all ttys anyway to suppress errors like INIT: Id "1" respawning too fast: disabled for 5 minutes * press ALT+F2 and login to your guest vserver ==== Running 32 bit vserver on an 64 bit host ==== With recent [[package>util-vserver]] package you can create 32-bit guest systems inside a 64-bit host. To specify arch during guest creation, use ''-d'' option, and to change what ''uname'' returns, use arguments ''%%--personality linux_32bit --machine i686%%'': # vserver test build --context -n test -m poldek -- -d pld-th-i686 --personality linux_32bit --machine i686 If you need to set ''uts'' parameters afterwards, you can just echo them: # echo linux_32bit >> /etc/vservers/test/personality # echo i686 > /etc/vservers/test/uts/machine ==== Package built for different operating system (linux) ==== When upgrading packages on vservers with recent rpm one might run into an error with message: error: package.arch: package is for a different operating system (linux) it can be resolved by copying rpm platform information from host system to vservers settings directory: # cp /usr/lib/util-vserver/distributions/defaults/rpm/platform \ /etc/vservers//apps/pkgmgmt/base/rpm/etc/platform or you can run this script to update all vservers: #!/bin/sh p=/usr/lib*/util-vserver/distributions/defaults/rpm/platform for a in /etc/vservers/*/apps/pkgmgmt/base/rpm/etc/macros; do [ -f "$a" ] || continue f=${a%/macros}/platform [ ! -f "$f" ] || continue cp $p $f done this script doesn't affect newly created vservers. also beware that if you have i686 guests on x86_64 host, the platform file would contain illegal x86_64 entries. ==== Can't upgrade FHS package ==== You will be most likely get error like: error: unpacking of archive failed on file /proc: cpio: chown failed - Operation not permitted The fix is to add ''/proc'' to ''/etc/vservers/test/apps/pkgmgmt/base/rpm/etc/macros'' ''%_netsharedpath'' list: %_netsharedpath /dev:/proc and in case you have internalized rpmdb the macro file is there: ''/vservers/test/etc/rpm/macros'' ==== loopback in kernel with vserver 2.3 series ==== How to enable and disable loopback addresses in vserver 2.3 series so 127.0.0.1 will work in guest. If your kernel has CONFIG_VSERVER_AUTO_LBACK=y then loopback addresses and things will be assigned and made visible in your guests automaticly. You can disable that on by guest basis by doing: echo "~lback_remap" >> /etc/vservers/xyz/nflags echo "~hide_lback" >> /etc/vservers/xyz/nflags If your kernel has CONFIG_VSERVER_AUTO_LBACK option disabled you can still get automatic loopback addresses on by guest basis by doing: echo "lback_remap" >> /etc/vservers/xyz/nflags echo "hide_lback" >> /etc/vservers/xyz/nflags (util-vserver 0.30.214 or newer needed) ==== binding to address 0.0.0.0 binds only to single IP ==== Newer Vserver from 2.3 series allows administrator to enable special handling of network contexts for guests with single IP only. Default value for this option is compiled into kernel as CONFIG_VSERVER_AUTO_SINGLE. When it is enabled any service configured to bind to all available IP addresses will bind only to single IP configured in ''/etc/vservers/guest/interfaces''. It will not even bind to loopback interface. To enable special handling of network contexts in guests with a single IP do: echo "single_ip" >> /etc/vservers/xyz/nflags Similarly to disable this option if its enabled in kernel do: echo "~single_ip" >> /etc/vservers/xyz/nflags ==== SMACK enabled kernels ==== Smack enabled kernels (in PLD default kernel >= 2.6.25) use security.SMACK64 to store some data. Unfortunately vserver by default doesn't allow to change xattr. This can lead to problems like this: # pwconv Cannot set attribute security.SMACK64 for `/etc/passwd.tmpbPZiEN': Operation not permitted Error while converting `root' to shadow account. There are two solutions for this. First is to enable setfcap capability (NOTE: it enables in guest much more than is needed by smack, so seriously consider security implications for that!): echo SETFCAP >> /etc/vservers/xyz/bcapabilities Second one is disabling SMACK entirely if not needed. This can be done by choosing other security module to be used by default (capability, selinux) using kernel boot command line option: security=capability (< 2.6.27) security=default (>= 2.6.27) Note: this option is available in vanilla kernels >= 2.6.26 and backported to PLD >= 2.6.25.9. ==== kernel oopses at pick_next_task_fair ==== Almost all kernels (including 2.6.27.x and 2.6.30/31) with vserver patch have a bug that causes oopses in pick_next_task_fair when using `sched_hard' in vserver/xyz/flags. Temporary solution is to avoid using sched_hard. Latest 2.6.31 patches contain different way to get behaviour similar to sched_hard - it's called CFS hard limit and is explained in kernel documentation (in vserver patch). ==== When using nice and su (for example, in the updatedb cron job), I get: su: Permission denied. ==== A guest cannot lower its nice value - and that's what 'su' does through pam_limits which sets a nice value of 0. Solution: set SYS_NICE bcapability for guest to allow it to lower it's nice value. ==== Why there is no used memory reported on 2.6.33 inside of vserver guest ? ==== 2.6.33 started to use cgroups for accounting and since by default no cgroup is configured then accounted used memory is 0. Drop virt_mem flag or set cgroup memory limit. Look at [[http://linux-vserver.org/util-vserver:Cgroups|http://linux-vserver.org/util-vserver:Cgroups]] for more information. ==== Running auditd inside guest ==== You need ''CAP_AUDIT_CONTROL'' in ''bcapabilities'' and lower ''priority_boost'' to ''0'' in ''/etc/audit/auditd.conf'' ==== XFS filesystem - kernel upgrade causes xfs related oops (xfs_filestream_lookup_ag) ==== After upgrading from 2.6-3.4 kernels (possibly other versions) to 3.18 (tested, possibly other versions) kernel ooppses almost immediately after accessing some files on xfs filesystem with ''xfs_filestream_lookup_ag'' visible in stack trace (or other filestream related function). That's because vserver patch for kernels earlier than 2.6.23 patched xfs filesystem to introduce new flag: #define XFS_XFLAG_BARRIER 0x00004000 /* chroot() barrier */ and files/dirs with such flag got saved on your filesystem. Starting with kernel 2.6.23 kernel introduced filestreams which are using 0x00004000 bit, thus causing conflict with vserver. #define XFS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */ Vserver stopped adding such xfs xflag in 3.13 BUT your existing filesystem can still have XFS_XFLAG_BARRIER (0x00004000) set causing oops in newer kernels. How to find out if I'm affected? IIF you don't use filestream feature then modify http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blob_plain;f=src/bstat.c;hb=HEAD to show only files containing XFS_XFLAG_FILESTREAM diff --git a/src/bstat.c b/src/bstat.c index 4e22ecd..887512f 100644 --- a/src/bstat.c +++ b/src/bstat.c @@ -34,19 +34,21 @@ dotime(void *ti, char *s) void printbstat(xfs_bstat_t *sp) { - printf("ino %lld mode %#o nlink %d uid %d gid %d rdev %#x\n", - (long long)sp->bs_ino, sp->bs_mode, sp->bs_nlink, - sp->bs_uid, sp->bs_gid, sp->bs_rdev); - printf("\tblksize %d size %lld blocks %lld xflags %#x extsize %d\n", - sp->bs_blksize, (long long)sp->bs_size, (long long)sp->bs_blocks, - sp->bs_xflags, sp->bs_extsize); - dotime(&sp->bs_atime, "atime"); - dotime(&sp->bs_mtime, "mtime"); - dotime(&sp->bs_ctime, "ctime"); - printf( "\textents %d %d gen %d\n", - sp->bs_extents, sp->bs_aextents, sp->bs_gen); - printf( "\tDMI: event mask 0x%08x state 0x%04x\n", - sp->bs_dmevmask, sp->bs_dmstate); + if (sp->bs_xflags & XFS_XFLAG_FILESTREAM) { + printf("ino %lld mode %#o nlink %d uid %d gid %d rdev %#x\n", + (long long)sp->bs_ino, sp->bs_mode, sp->bs_nlink, + sp->bs_uid, sp->bs_gid, sp->bs_rdev); + printf("\tblksize %d size %lld blocks %lld xflags %#x extsize %d\n", + sp->bs_blksize, (long long)sp->bs_size, (long long)sp->bs_blocks, + sp->bs_xflags, sp->bs_extsize); + dotime(&sp->bs_atime, "atime"); + dotime(&sp->bs_mtime, "mtime"); + dotime(&sp->bs_ctime, "ctime"); + printf( "\textents %d %d gen %d\n", + sp->bs_extents, sp->bs_aextents, sp->bs_gen); + printf( "\tDMI: event mask 0x%08x state 0x%04x\n", + sp->bs_dmevmask, sp->bs_dmstate); + } } and then run it with mounted directory of each filesystem (bstat /; bstat /home etc). It will print "ino ..." information for filestream files. How to clean up? rsync files to other partition, recreate problematic partition and then copy files back. ===== Debian or Ubuntu guest installation ===== Install ''binutils'' package and optionally ''debootstrap'' (vserver will install it on it's own if you don't install it yourself): # vserver test build -m debootstrap --context 1234 -- -d etch -m http://ftp.pl.debian.org/debian -- --arch i386 Could not find local version of 'debootstrap'; downloading it from http://ftp.pl.debian.org/debian/pool/main/d/debootstrap/debootstrap_1.0.3_all.deb... 11:01:58 URL:http://ftp.pl.debian.org/debian/pool/main/d/debootstrap/debootstrap_1.0.3_all.deb [49086/49086] -> "/var/tmp/debootstrap.Rseedf/debootstrap.deb" [1] I: Retrieving Release I: Retrieving Packages I: Validating Packages I: Resolving dependencies of required packages... I: Resolving dependencies of base packages... I: Checking component main on http://ftp.pl.debian.org/debian... I: Retrieving adduser I: Validating adduser I: Retrieving apt I: Validating apt [...] I: Extracting zlib1g... I: Installing core packages... I: Unpacking required packages... I: Unpacking base-files... I: Unpacking base-passwd... I: Unpacking bash... I: Unpacking bsdutils... [...] I: Unpacking zlib1g... I: Configuring required packages... I: Configuring sysv-rc... I: Configuring tzdata... I: Configuring gcc-4.1-base... [...] I: Configuring debconf-i18n... I: Configuring debconf... I: Unpacking the base system... I: Unpacking adduser... I: Unpacking apt... [...] I: Configuring sysklogd... I: Configuring tasksel... I: Base system installed successfully. # ls /vservers/test/ bin boot dev etc home initrd lib media mnt opt proc root sbin srv sys tmp usr var Set up guest hostname: # echo test > /etc/vservers/test/uts/nodename Done. Note that file ''/usr/lib{,64}/util-vserver/defaults/debootstrap.uri'' may need URL update pointing to new debootstrap version if old is no longer there. Possible Debian -d (distributions): squeeze, etch, lenny, sarge, sid. Popular --arch: i386, amd64, powerpc. Possible Ubuntu distributions: breezy, dapper, edgy, feisty, gutsy, horay. Note that upstart in some Ubuntu distributions is [[https://bugs.launchpad.net/upstart/+bug/251113|broken]] and needs such workaround to get running: echo TERM=linux >> /etc/vservers/VSERVER_NAME/apps/init/environment ===== CentOS guest installation ===== Install ''yum'' and ''yum-metadata-parser'' packages. # vserver test build -n test --context 105 -m yum -- -d centos5 ============================================================================= Package Arch Version Repository Size ============================================================================= Installing: glibc i686 2.5-12 base 5.1 M Installing for dependencies: basesystem noarch 8.0-5.1.1.el5.centos base 2.8 k filesystem i386 2.4.0-1.el5.centos base 116 k glibc-common i386 2.5-12 base 16 M libgcc i386 4.1.1-52.el5.2 updates 82 k setup noarch 2.5.58-1.el5 base 126 k tzdata noarch 2007h-1.el5 updates 746 k Transaction Summary ============================================================================= Install 7 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 22 M warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID e8562897 Importing GPG key 0xE8562897 "CentOS-5 Key (CentOS 5 Official Signing Key) " from http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-5 Installed: glibc.i686 0:2.5-12 Dependency Installed: basesystem.noarch 0:8.0-5.1.1.el5.centos filesystem.i386 0:2.4.0-1.el5.centos glibc-common.i386 0:2.5-12 libgcc.i386 0:4.1.1-52.el5.2 setup.noarch 0:2.5.58-1.el5 tzdata.noarch 0:2007h-1.el5 ============================================================================= Package Arch Version Repository Size ============================================================================= Installing for dependencies: MAKEDEV i386 3.23-1.2 base 135 k SysVinit i386 2.86-14 base 113 k audit-libs i386 1.3.1-1.el5 base 39 k bash i386 3.1-16.1 base 1.8 M bzip2-libs i386 1.0.3-3 base 37 k centos-release i386 10:5-0.0.el5.centos.2 base 19 k centos-release-notes i386 5.0.0-2 base 112 k chkconfig i386 1.3.30.1-1 base 158 k coreutils i386 5.97-12.1.el5 base 3.6 M cracklib i386 2.8.9-3.1 base 58 k cracklib-dicts i386 2.8.9-3.1 base 3.3 M db4 i386 4.3.29-9.fc6 base 917 k device-mapper i386 1.02.13-1.el5 base 582 k e2fsprogs i386 1.39-8.el5 base 957 k e2fsprogs-libs i386 1.39-8.el5 base 112 k ethtool i386 5-1.el5 base 60 k findutils i386 1:4.2.27-4.1 base 294 k gawk i386 3.1.5-14.el5 base 1.7 M gdbm i386 1.8.0-26.2.1 base 27 k glib2 i386 2.12.3-2.fc6 base 677 k grep i386 2.5.1-54.2.el5 base 174 k info i386 4.8-14.el5 base 172 k initscripts i386 8.45.14.EL-1.el5.centos.1 base 1.4 M iproute i386 2.6.18-4.el5 base 801 k iputils i386 20020927-43.el5 base 124 k krb5-libs i386 1.5-29 updates 592 k libacl i386 2.2.39-1.1 base 19 k libattr i386 2.4.32-1.1 base 12 k libcap i386 1.10-26 base 22 k libselinux i386 1.33.4-2.el5 base 93 k libsepol i386 1.15.2-1.el5 base 129 k libstdc++ i386 4.1.1-52.el5.2 updates 350 k libtermcap i386 2.0.8-46.1 base 14 k mcstrans i386 0.1.10-1.el5 base 15 k mingetty i386 1.07-5.2.2 base 19 k mktemp i386 3:1.5-23.2.2 base 14 k module-init-tools i386 3.3-0.pre3.1.16.0.1.el5 updates 411 k ncurses i386 5.5-24.20060715 base 1.1 M net-tools i386 1.60-73 base 359 k openssl i686 0.9.8b-8.3.el5_0.2 updates 1.4 M pam i386 0.99.6.2-3.14.el5 base 923 k pcre i386 6.6-2.el5_0.1 updates 112 k popt i386 1.10.2-37.el5 base 67 k procps i386 3.2.7-8.1.el5 base 207 k psmisc i386 22.2-5 base 61 k python i386 2.4.3-19.el5 base 5.9 M readline i386 5.1-1.1 base 223 k sed i386 4.1.5-5.fc6 base 174 k shadow-utils i386 2:4.0.17-12.el5 base 1.0 M sysklogd i386 1.4.1-39.2 base 73 k termcap noarch 1:5.5-1.20060701.1 base 265 k udev i386 095-14.5.el5 base 877 k util-linux i386 2.13-0.44.el5 base 1.8 M zlib i386 1.2.3-3 base 50 k Transaction Summary ============================================================================= Install 54 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 34 M Dependency Installed: MAKEDEV.i386 0:3.23-1.2 SysVinit.i386 0:2.86-14 audit-libs.i386 0:1.3.1-1.el5 bash.i386 0:3.1-16.1 bzip2-libs.i386 0:1.0.3-3 centos-release.i386 10:5-0.0.el5.centos.2 centos-release-notes.i386 0:5.0.0-2 chkconfig.i386 0:1.3.30.1-1 coreutils.i386 0:5.97-12.1.el5 cracklib.i386 0:2.8.9-3.1 cracklib-dicts.i386 0:2.8.9-3.1 db4.i386 0:4.3.29-9.fc6 device-mapper.i386 0:1.02.13-1.el5 e2fsprogs.i386 0:1.39-8.el5 e2fsprogs-libs.i386 0:1.39-8.el5 ethtool.i386 0:5-1.el5 findutils.i386 1:4.2.27-4.1 gawk.i386 0:3.1.5-14.el5 gdbm.i386 0:1.8.0-26.2.1 glib2.i386 0:2.12.3-2.fc6 grep.i386 0:2.5.1-54.2.el5 info.i386 0:4.8-14.el5 initscripts.i386 0:8.45.14.EL-1.el5.centos.1 iproute.i386 0:2.6.18-4.el5 iputils.i386 0:20020927-43.el5 krb5-libs.i386 0:1.5-29 libacl.i386 0:2.2.39-1.1 libattr.i386 0:2.4.32-1.1 libcap.i386 0:1.10-26 libselinux.i386 0:1.33.4-2.el5 libsepol.i386 0:1.15.2-1.el5 libstdc++.i386 0:4.1.1-52.el5.2 libtermcap.i386 0:2.0.8-46.1 mcstrans.i386 0:0.1.10-1.el5 mingetty.i386 0:1.07-5.2.2 mktemp.i386 3:1.5-23.2.2 module-init-tools.i386 0:3.3-0.pre3.1.16.0.1.el5 ncurses.i386 0:5.5-24.20060715 net-tools.i386 0:1.60-73 openssl.i686 0:0.9.8b-8.3.el5_0.2 pam.i386 0:0.99.6.2-3.14.el5 pcre.i386 0:6.6-2.el5_0.1 popt.i386 0:1.10.2-37.el5 procps.i386 0:3.2.7-8.1.el5 psmisc.i386 0:22.2-5 python.i386 0:2.4.3-19.el5 readline.i386 0:5.1-1.1 sed.i386 0:4.1.5-5.fc6 shadow-utils.i386 2:4.0.17-12.el5 sysklogd.i386 0:1.4.1-39.2 termcap.noarch 1:5.5-1.20060701.1 udev.i386 0:095-14.5.el5 util-linux.i386 0:2.13-0.44.el5 zlib.i386 0:1.2.3-3 # ls /vservers/test/ bin boot dev etc home lib media mnt opt proc root sbin selinux srv sys tmp usr var vservers As you can see there is /vservers directory inside our new guest. This is probably due to bug in either yum itself or yum-chroot.patch from util-vserver package. This bug also causes many errors like: could not open ts_done file: [Errno 2] No such file or directory: '/vservers/test//var/lib/yum/transaction-done.2007-11-14.13:40.11' Those errors may be safely ignored (there were deleted from example above) and directory may be removed: # rm -rf /vservers/test/vservers/ Please keep in mind that there will be no messages on screen while yum is working in background. It will only display results when finished. Be patient :-) You may also install older CentOS 4 by using ''-d centos4'' Set up guest hostname: # echo test > /etc/vservers/test/uts/nodename You may also wish to run pwconv inside guest system. ==== internalized package management ==== If you wish to use yum or rpm inside newly created guest you must do few more things. Install yum: # vyum test -- install yum ============================================================================= Package Arch Version Repository Size ============================================================================= Installing: yum noarch 3.0.5-1.el5.centos.2 base 481 k Installing for dependencies: beecrypt i386 4.1.2-10.1.1 base 116 k elfutils-libelf i386 0.125-3.el5 base 52 k expat i386 1.95.8-8.2.1 base 77 k m2crypto i386 0.16-6.el5.1 base 487 k python-elementtree i386 1.2.6-5 base 83 k python-sqlite i386 1.1.7-1.2.1 base 39 k python-urlgrabber noarch 3.1.0-2 base 127 k rpm i386 4.4.2-37.el5 base 638 k rpm-libs i386 4.4.2-37.el5 base 966 k rpm-python i386 4.4.2-37.el5 base 53 k sqlite i386 3.3.6-2 base 213 k Transaction Summary ============================================================================= Install 12 Package(s) Update 0 Package(s) Remove 0 Package(s) Total download size: 3.3 M Is this ok [y/N]: y Installed: yum.noarch 0:3.0.5-1.el5.centos.2 Dependency Installed: beecrypt.i386 0:4.1.2-10.1.1 elfutils-libelf.i386 0:0.125-3.el5 expat.i386 0:1.95.8-8.2.1 m2crypto.i386 0:0.16-6.el5.1 python-elementtree.i386 0:1.2.6-5 python-sqlite.i386 0:1.1.7-1.2.1 python-urlgrabber.noarch 0:3.1.0-2 rpm.i386 0:4.4.2-37.el5 rpm-libs.i386 0:4.4.2-37.el5 rpm-python.i386 0:4.4.2-37.el5 sqlite.i386 0:3.3.6-2 Run pkgmgmt interalize: # vserver test pkgmgmt internalize Since CentOS uses different version of ''db'' you will get following errors while trying to use vyum/vrpm outside guest or yum/rpm inside guest: # vrpm test -- -qa rpmdb: Program version 4.3 doesn't match environment version error: db4 error(-30974) from dbenv->open: DB_VERSION_MISMATCH: Database environment version mismatch error: cannot open Packages index using db3 - (-30974) error: cannot open Packages database in /var/lib/rpm To fix this please execute following commands: # vserver test start # vserver test enter bash-3.1# rm -f /var/lib/rpm/__db.* bash-3.1# rpm --rebuilddb If that doesn't work, try the following: # cd /vservers/test/var/lib/rpm # rm -f __db.* # db_dump Packages > Packages.dump # vserver test start # vserver test enter bash-3.1# cd /var/lib/rpm bash-3.1# rm Packages bash-3.1# db_load Packages < Packages.dump bash-3.1# rpm --rebuilddb bash-3.1# rm Packages.dump ===== Using quota in vservers ===== To enable quota in a vserver you need to: * enable quota on the "real" device mounted in vserver (in /etc/fstab): /dev/space/vserver1_home /vservers/test/home xfs defaults,usrquota 0 0 * load the ''vroot'' module and add it to your ''/etc/modules''. you can optionaly increase max vroot number of devices by putting the limit in your ''/etc/modprobe.conf'': options vroot max_vroot=64 * assing a free vroot node for the device in question: # vrsetup /dev/vroot3 /dev/space/vserver1_home * copy the vroot device to the guest: # cp -af /dev/vroot3 /vservers/test/dev/ * add to ''/etc/vservers/test/apps/init/mtab'': /dev/vroot3 /home/ xfs defaults,usrquota 0 0 * add ''quota_ctl'' to ''/etc/vservers/test/ccapabilities'': * restart your vserver and run ''edquota'' inside ===== Network namespace in vservers ===== Starting from util-vserver 0.30.216-1.pre3054 there is basic support for creating network namespaces with interfaces inside. Enabling netns and two capabilities: NET_ADMIN (allows interfaces in guest to be managed) and NET_RAW (makes iptables working). mkdir /etc/vservers/test/spaces touch /etc/vserver/test/spaces/net echo NET_ADMIN >> /etc/vservers/test/bcapabilities echo NET_RAW >> /etc/vservers/test/bcapabilities echo 'plain' > /etc/vservers/test/apps/init/style Avoid context isolation since it makes little sense when using network namespaces: touch /etc/vserver/test/noncontext Configure interfaces: 0 - arbitrary directory name, just for ordering myiface0 will be interface name inside of guest (optional, default geth0, geth1 and so on) veth-host - interface name on the host side mkdir -p /etc/vservers/test/netns/interfaces/0 echo myiface0 > /etc/vservers/test/netns/interfaces/guest echo veth-host > /etc/vservers/test/netns/interfaces/host !!! FINISH ME. FINISH ME. FINISH ME. !!! ===== Network namespace in vservers (OLD WAY) ===== Enabling netns and two capabilities: NET_ADMIN (allows interfaces in guest to be managed) and NET_RAW (makes iptables working). Plain init style is needed for post-start to run as soon as possible (and with plain init style is just after starting init process). mkdir /etc/vservers/test/spaces touch /etc/vservers/test/spaces/net echo NET_ADMIN >> /etc/vservers/test/bcapabilities echo NET_RAW >> /etc/vservers/test/bcapabilities echo 'plain' > /etc/vservers/test/apps/init/style veth-cXYZ - host interface eth-cXYZ - guest interface ifcfg-veth-cXYZ on host should have ONBOOT=no (it will be started when vserver starts) Create /etc/vservers/test/scripts/post-start script: #!/bin/sh VSERVER_SCRIPT="$1" VSERVER_NAME="$2" CONTEXT=$(cat /etc/vservers/${VSERVER_NAME}/context) VSERVER_IFACE_SUFFIX="c${CONTEXT}" VSERVER_HOST_IFACE="veth-${VSERVER_IFACE_SUFFIX}" VSERVER_GUEST_IFACE="eth-${VSERVER_IFACE_SUFFIX}" ip link add name "${VSERVER_HOST_IFACE}" type veth peer name "${VSERVER_GUEST_IFACE}" vserver ${VSERVER_NAME} exec sleep 60 & for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do pid=$(vserver ${VSERVER_NAME} exec pidof -s sleep) [ -n "$pid" ] && break usleep 100000 done if [ -z "$pid" ]; then echo "vserver guest $VSERVER_NAME: failed to find guest net namespace" >&2 fi ip link set "${VSERVER_GUEST_IFACE}" netns $pid sysctl -q -w net.ipv4.conf.${VSERVER_HOST_IFACE}.forwarding=1 /sbin/ifup "${VSERVER_HOST_IFACE}" exit 0 Create /etc/vservers/test/scripts/post-stop script: #!/bin/sh VSERVER_SCRIPT="$1" VSERVER_NAME="$2" CONTEXT=$(cat /etc/vservers/${VSERVER_NAME}/context) VSERVER_IFACE_SUFFIX="c${CONTEXT}" VSERVER_HOST_IFACE="veth-${VSERVER_IFACE_SUFFIX}" VSERVER_GUEST_IFACE="eth-${VSERVER_IFACE_SUFFIX}" ip link del "${VSERVER_HOST_IFACE}" 2> /dev/null exit 0 You end with one interface on the host and one inside guest (virtually connected). Configure interfaces, routing as on normal system. Notes: * vserver name can't be longer than 10 characters. Longer one will produce interface names longer than limit (15 characters; veth- + vserver name) * this method is racy. post-start is running in parallel to init process inside of guest system. If guest is faster and tries to configure networking before post-start puts new iterface into guest you are doomed. Fortunately this is unlikely to happen as post-start is short and should always be first before networking is being configured by guest scripts. Race could be avoided by implementing proper netns interface moving support into util-vserver scripts. * enabling pid namespace is likely to break post-start script (part with guest pid fetching for iproute2 netns use). Using vps (aka context 1 spectacor mode) to find guest process pid but in host namespace is likely to solve this problem. ===== cgroups ===== Example cgroups usage: * create "cgroup" directory in /etc/vserver/test/ * put files there like: * cpuset.cpus - numbers of cores used by Vserver: 0-n * cpuset.mems - NUMA node numbers * cpuset.memory_migrate - memory migration: (1 - do memory migration when shuting down cores or 0 - not) * cpu.shares - Vserver's CPU share: for example 256 * memory.limit_in_bytes - Vserver's RAM: 256M Important: * the share you get is equal to the guest's share divided by the sum of the cpu shares of all the guest. Default shares is 1024 (for guest, host... generally default in kernel cgroup) and is inherited from parent cgroup. * there is no hierarchy when dealing (beside inheriting default value) with cpu.shares. All shares are summed and cgroup gets it's "cpu.share/sum". For example host has default 1024, guest gets 2048 set. This means that host will get 1/3 of cpu power and guest will get 2/3. * virt_mem flag is needed for guest to see only cgroup limited memory ===== cgroups with libcgroup ===== libcgroup can mount cgroup differently. It can use separate subdirectory for each cgroup subsystem like: # cat /proc/mounts |grep cgroup cgroup /sys/fs/cgroup/blkio cgroup rw,relatime,blkio 0 0 cgroup /sys/fs/cgroup/cpu cgroup rw,relatime,cpu 0 0 cgroup /sys/fs/cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,relatime,freezer 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,relatime,memory 0 0 cgroup /sys/fs/cgroup/net_cls cgroup rw,relatime,net_cls 0 0 For these to work you need at least util-vserver-0.30.216-1.pre2955.3 (that .3 is important) and turn on per subsys support by doing: # mkdir /etc/vservers/.defaults/cgroup # touch /etc/vservers/.defaults/cgroup/per-ss ===== cgroups mountpoint ===== if you have cgroups mounted somewhere else, you can inform vserver of that (it searching in ''/sys/fs/cgroup'' by default) none /dev/cgroup cgroup cpuset,cpu,cpuacct,devices,freezer,net_cls 0 0 you need to tell vserver where it mounted: # cat /etc/vservers/.defaults/cgroup/mnt /dev/cgroup