I recently had a problem on a multi user computer. One user was demanding so much resource that the computer couldn't keep up. *nix skills to the rescue!
Long Story about Multi-User Computer:
TL:DR FML. Now skip to 'the Fix'.Oh... you really care? OK, imagine the scenario as follows, machine has 16 cores:
User 1(myself): Running code on a single processor for long term test that take days to weeks of continuous computation
User 2 Running OMP tasks using 8 cores
User 3: Running OMP tasks using 13 cores
The computer (which is a linux box that is used for grunt processing) only has 16 cores, so obviously something gives. Sadly the computer doesn't know that non of these processes are interactive, so it tries to carry out everything at once. This means that each process gets a slices of 100ms in the ratio 1:8:13. Worse still my code running on 1 core is now geting 72.7% of a core. This wouldn't be the end of the world, but my code is quite RAM intensive, so instead of running for 100ms every 2.2s it actually spends most of that time loading data in and out of the cache. A run that I know used to take 40 hours went to over 8 days. This affects everyone equally, but if you are using OMP tasks then the cache is shared and there is no need to load data in or out of it. This means that you won't lose so much time when passing from one of your slices to another of your slices. My process only get one slice, so I'm always handing off to another user.
User 2 is under a lot of pressure and I have chosen not to push it.
Politely asking to User 3 to curtail their OMP task number to the number of cores available did not work. Apparently the user with 13 cores has a 'bug' in their code that they can't explain or show me. They claim not to understand it, but it magically requires 13 cores, they won't explain it to me or try to fix it. They have also convinced our boss that this is reasonable behaviour.
User 3 routinely crashes the system when they decide to run code that leaks memory. This obviously makes me lose weeks of work. Yes, I should save state and be able to reload the code. Its on my list, but it will take weeks to shoe horn in.
But wait... linux has an existing system for just my situation! CGroups? Great! They create soft limits on CPU time, if there is no contention then processes are not curtailed, if there is, then time processor time is proportional to your CPUShare. I talk to the sysadmin, he's sympathetic! He asks me to find out about CGroups so that we share the load of putting them onto the system.
The Fix
Looking online it seems simple. It turns out that it is, but most of the online documents are for V1 not V2! Unless you are working with a legacy system, then the solution for giving users CPUShares and setting a memory limit are as follows.
Obligatory note: This is a fairly technical blog, it is your responsibility to know what you are doing. Take backups of all files that you are changing (e.g. sudo cp
Optional: install cgroup-tools, but you need it if you want to run 'lscgroup' or network limits (which I haven't tried).
# apt install cgroup-tools1. sudo nano /etc/default/grub, add the line:
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
$ sudo update-grub (note the sysadmin also found update-grub2 on his system)2. Close all programs and run: (this will reboot your computer)
$ sudo reboot
3. set DefaultCPUAccounting=yes and DefaultMemoryAccounting=yes in /etc/systemd/system.conf
4. Each user is referred to in control groups by a file called 'user-
$ cat /etc/passwd |grepwhich outputs something like:
andrew:x:1000:1000:andrew,,,:/home/andrew:/bin/bashMy uid is 1000 (actually the first uid defaults to 1000 and counts upwards sequentially in most systems).
Alternatively, run lscgroup and look for the names that start 'user' and all the other cgroups.
5. To give a user 1000 CPUShares, and limit them to 2G of memory. Run the commands:
# systemctl set-property user-1001.slice MemoryLimit=2G6. and check that they are set correctly with:
# sudo systemctl set-property user-1001.slice CPUShares=500
# systemctl show -p CPUShares user-1001.sliceRepeat step 3 for each user. That's it!
# systemctl show -p MemoryLimit user-1001.slice
Now if you want to check what this does, try installing a program like stress. If you have two users with different CPUShares, they'll get core access in a ratio proportional to the number of shares.
$stress --cpu 2 --timeout 60and to check if the memory limit works, this script should get killed if the limit is less than 3Gb:
$dd bs=3G if=/dev/zero of=/dev/nullIn the picture below, I am running the above line of code twice as two different users. One is getting two thirds of the two cpus on this laptop the other is getting 1 third, the shares are 1000 & 500 respectively. :-)