A lot of my bash scripting experience has been, in one sense, relatively simple. I have several scripts that span several hundred lines and do fairly complex things across multiple systems. From that perspective they aren’t necessarily simple. However it wasn’t until recently that I had to really starting thinking about managing when scripts run and particularly keeping them from “stepping all over each other” when multiple instances of the same script must be run… enter the topic of “Job Control” or “Controlled Execution.”
A common scenario is that your bash script is written to access some shared resource. A few examples of such shared resources:
-An executable file that can only have one running instance at any given time
-A log file that must be written to in a certain order
-Sensitive system files (such as the interfaces file).
What happens if a bash script gets executed once, and then before the first instance finishes running a second instance is fired off? The short answer is typically unexpected/bad stuff that tends to break things.
So the solution is to introduce some job control logic into your scripts. And to that end I want to talk about two methods of controlling job execution that I have started to employ heavily for one of my projects: Simple Lock Files, and the more involved FLOCK application built into most newer Linux distributions. For reference, most of this article is based on a system running Debian Jessie.
Using Simple File Locks for Job Control
Let’s start here… I call this method simple because we are going to be using logic that is built into bash. If you do bash scripting and you understand basic “IF… THEN…” logic you can get this up and running in a few minutes.
Here is example code for a simple file lock:
#Create the Locking dir if it doesn't exist
if [[ ! -d "/lockdir/" ]]; then
mkdir -p /lockdir/
fi
#Check if there is currently a lock in place, if so then exit, if not then create a lock
if [ -f "/lockdir/myscript.lock" ]; then
echo "myscript is currently already running"
exit
else
touch /lockdir/myscript.lock
fi
#DO SOME STUFF - THIS IS THE HEART OF YOUR SCRIPT
echo "hello world, I am going to sleep for 20 seconds, during which time I will continue holding the lock file."
sleep 20
echo "I am awake and still haven't released the lock."
#release the lock
rm /lockdir/myscript.lock
echo "The lock is now released and I am exiting."
What is happening in the above should be self-evident. If the script is running, there is a file called “myscript.lock” that exists. The script checks for the presence of this file and exits if it exists. If it doesn’t exist it creates the file and then continues to run. The last thing done in the script is removing the lock file.
The major pro of using this method is that it is really quite straightforward to implement and works pretty darn well… as long as your script never fails before the lock file gets deleted on the last line… And that would be a big con. I am going to discuss using a mechanism called “Exit Traps” in a follow-up article (you know… once I get them figured out…) which can help mitigate this problem.
Additionally, there is another issue with this method… subsequent scripts runs just exit. If that is your desired behavior that’s great. Cron jobs for example which run at regular intervals would benefit from this. However, what happens if instead of just killing subsequent job runs, you want to queue them? I.e., Instance 1 is currently running, Instances 2 and 3 were just started. Rather than instance 2 and 3 just exiting, we want them to “wait their turn”. So Instance 1 finishes, then Instance 2 starts, Instance 3 continues to wait, then Instance 2 finishes and Instance 3 then starts…
I started down the path of trying to code this using While loops and it got convoluted quickly. After a bit of digging around though I eventually came across FLOCK.
Using FLOCK for Job Control
FLOCK’s default behavior is queuing… so it was ideal for my needs.
The majority of examples around the net of Flock usage involve invoking a sub-shell within your script. This has one major advantage, if something fails in that sub-shell, the lock will still be released. This resolves one of the cons listed above regarding the simple file lock method.
However sub-shells introduce a performance penalty and you may have to change some of your scripting around due to variable scoping. In short, variables declared OUTSIDE of the sub-shell, ARE available INSIDE of the sub-shell, HOWEVER, variables declared INSIDE of the sub-shell, ARE NOT available OUTSIDE of the sub-shell. Just read that a few times and hopefully it will click, it took me actually seeing it happen via some trial and much error before I really got a handle on it.
One of the scripts I was working on had some dynamically created variables and I needed to lock that section of the script. I used the default sub-shell method and was very confused because my script kept doing odd things. The dynamically created variables within the sub-shell weren’t being spit out to the rest of the script.
Okay, with that being understood, here is what the syntax looks like for the sub-shell method of using FLOCK:
#This area is outside of subshell, you can run commands out here that are not going to be locked.
#Also any variables declared here will be available throughout the script, including within the sub-shell.
(
flock 200
# ...This area is in the subshell, commands here will be executed under lock ...
# variables declared here will NOT be available to the rest of the script
) 200>/var/lock/mylockfile
#This area is outside of subshell, you can run commands out here that are not going to be locked
Yes it really is that simple.
200 is a file descriptor applied to “/var/lock/mylockfile” in the above. You don’t need to use the number “200”, you can use just about any number. Just make sure your flock command references the number of the file descriptor applied. It is recommended you use a relatively high number. One thing to note, make sure you specify the shell to be used (i.e. the #!/bin/bash) at the start. Some distros may default to a lighter-weight shell (ex. Dash) that don’t have the same functionality as bash. I ran into this issue on one of my systems and it locked up my script execution.
Next and last there is a method of using FLOCK where you don’t have to invoke a sub-shell. The advantages of doing this are:
1. Variables generated under lock are available throughout the script
2. No performance penalties
The disadvantage is that if the code you place under lock breaks there is a chance the lock won’t get released, which negates one of the advantages of using FLOCK vs. using Simple Lock Files. That being said, I am predominantly using this method because of the issue with variables. I am using a mix of FLOCK and Simple Lock Files throughout my code. I use FLOCK when I need the queuing behavior and Simple Lock Files when I just want subsequent runs of the script to terminate instead of queuing.
At this point I will mention that there is a command flag you can use with Flock that will tell subsequent runs of the script to simply terminate however my preference is still to use Simple Lock Files, the primary reason being that they are more transportable across OS’s/Shells whereas FLOCK somewhat introduces an outside dependency because it isn’t ubiquitous across linux distributions.
Okay so, FLOCK sans the sub-shell…
# commands placed here are executed without a lock
exec 200>/var/lock/mylockfile || exit 1
flock 200 || exit 1
# commands executed under lock are place here
flock -u 200
# commands placed here are executed without a lock
# Variables place inside of the lock or outside of the lock are available throughout the script (as long as they are declared before being used, which is typical bash script behavior)
Without going into a lot of gory detail, essentialy you are creating a file with a descriptor on it before you invoke FLOCK. That is what the “exec 200>” portion of the script is doing. Then you invoke flock with the file descriptor, put the commands you want to run under lock in, then manually release the lock using flock with the “-u” flag followed by the file descriptor.
In BASH speak, the double pipe “||” means “or”… The usage above translates to “create a file with this descriptor –or– if that fails then just exit the script” and “Use FLOCK to lock this portion of the script –or– if that fails just exit the script”. Essentially the idea is that if for some reason some portion of the code where the lock is invoked fails, then you want to exit your script rather than running commands without a lock that should be under a lock.
You can use different file descriptors and lock files to lock different portions of your script and subsequently can achieve some fairly complex job control within your script. Getting used to using this when appropriate should equal much more reliable/expected job/script execution with consistent results, especially for scripts that have to work with shared resources.
Cheers!
REFERENCES: FLOCK: Subshells: Other:
http://linux.die.net/man/1/flock
http://www.kfirlavi.com/blog/2012/11/06/elegant-locking-of-bash-program/
https://blog.famzah.net/2013/07/31/using-flock-in-bash-without-invoking-a-subshell/
http://stackoverflow.com/questions/13551840/bash-flock-why-200
http://stackoverflow.com/questions/23665780/flock1-is-failing-to-release-lock
http://jdimpson.livejournal.com/5685.html
http://stackoverflow.com/questions/18833448/php-flock-behaviour-when-file-is-locked-by-one-process
http://bencane.com/2015/09/22/preventing-duplicate-cron-job-executions/
http://www.tldp.org/LDP/abs/html/subshells.html
http://unix.stackexchange.com/questions/65751/how-to-get-functions-propagated-to-subshell
http://www.unix.com/shell-programming-and-scripting/42417-what-does-mean-double-pipe.html
http://stackoverflow.com/questions/9561300/exec-not-found-because-of-the-file-descriptor
https://www.gnu.org/software/bash/manual/html_node/Job-Control-Basics.html
http://wiki.bash-hackers.org/howto/mutex
The above ‘touch’ method is unsafe. It has two steps(commands) so it’s not atomic.
Please do expand (if you have time to). I am coming at scripting with roughly… no? formal education 🙂 – in the mean time I am re-reading my own article and googling. Thanks!
Verifying file existence and creating lock should be in one command, otherwise error could happen between the them in parallel condition. Say there are two shell (A and B) invoke your script in parallel. A verify the file existence and file does not exist, at the mean time B verify the file existence before A create the file, and then B recognizes file does not exist. So both A and B ‘touch’ the file and believe they acquired the lock.
To prevent the issue with locks never being released, how about
#/bin/bash
function on_exit() {
flock -u 200
}
trap on_exit EXIT
Cheers
Sascha