Get notified when shell scripts go wrong.


Project Summary: Can be used to automatically suspend execution of a shell script if runtime error occurs, followed by notifying the user who started the script. The suspended script can be easily resumed later by the user. This is particularly helpful in a scenario where you are running a script which does lot of work and you don’t want to do the cleanup and start the script again in case a runtime error occurs. In such cases script suspend/resume functionality can save you a lot of time.
Project Page:

I got the idea for this project a few weeks back, while running a large-scale DB migration on production servers at my day job, on how to handle the process errors better. Though I had tested migration process multiple times on all the possible things that could wrong, I couldn’t just ignore Mr. Murphy completely.

I also had few other rather obvious things to take care of like, website downtime required if any must be least possible, and I also was given relatively short time for process to be completed. This means if a problem occurs after the process has started I probably won’t be able to re-run the process from scratch, and would have to wait for next time window to schedule the migration process.

But there was a catch, if I could just suspend the process if/when the error occurs, then I could see what caused the error and I can just resume the process , instead of having to restart the process from scratch..
The task consisted mostly of – executing SQL and shell scripts in a Linux environment.

The main script( had the following format –

<pre-processing/ script init routines>
<post-processing/ process validation routines>

The process was originally designed, in a way that process(or a single step) failure wouldn’t cause any unintended side effects, similar to the ALL or NONE transactional philosophy of databases. So I only was optimizing on the not-having-to-restart the process part if some error occurs.

I thought of adding a check step after every SQL statements, which would check what happened to last executed statement and will suspend the shell script if previous step had an error. My script format after this change will be –

<pre-processing/ script init routines>
Check state of sql_statement_1
Check state of sql_statement_2
Check state of sql_statement_3
<post-processing/ process validation routines>

After deciding on to the what part, I had the problem of how and for following questions.

  1. How to check status of the last executed step in a shell script?
  2. How a running shell script could  suspend itself?
  3. How to notify user of the error situation, also providing the necessary information?
  4. How the suspended process can be resumed later by the user?

My solutions were as follows:


  1. How to check status of the last executed step in a shell script?
  2. Two words – “Exit Status”. And the fact that exit status of the last executed command in Linux is stored in the bash variable – $?

  3. How a running shell script can be able to suspend itself?
  4. This was an interesting question, and the answer surprised me, Named Pipes. I read about it long time back and found it interesting back then, but wasn’t able to find any use-case for it until todal. The basic idea is that
    the process about to get blocked will create a named pipe with a unique name(*Process ID) and will be at the receiving end of the pipe. This will suspend (*Cough* I/O block) the process until it gets the confirmation message (through the named pipe) to resume from the user.

    This entire process looks something like this:

    Process Commnuncation

  5. How to notify user about the error situation, providing the necessary information?
  6. Sending an e-mail to the user with process information and instructions to resume the process would do just fine. Another BIG plus is, you need not monitor script status continuously for any errors and will automatically get a notification e-mail when/if an error occurs.

  7. How to resume the suspended process later?
  8. User just has to send the string “RESUME” though the named-pipe to the blocked process, which will then break out of the I/O block and continue with the next step. He will also get the instructions on how to do this as a part of notification e-mail sent.

By the combined awesomeness of python+Linux, It took me one evening to code this entire project. I decided to release the code into open-source later, and was given the permission to do so, thanks, Serendio. 🙂
The project can be accessed here. So enjoy the code and if you can think of ways to make code better/ or more features. is just one fork away.

About Ashish Yadav

Hey there!!! I am Ashish Yadav, a Geek | Music Lover | Open Web enthusiast | Wannabe entrepreneur. I currently live in Chennai, India and share my thoughts, nifty hacks rather infrequently here with rest of the world. If you like my posts, may I suggest subscribing to the blog to read them as they are written :) . As a part of my day job I enjoy writing ubercool python code, and have loads of fun while at it. Opinions presented here are mine and not to be associated with my employer or anyone else. Opinions presented here are mine and not to be associated with my employer or anyone else.
This entry was posted in code, hack and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s