Last Updated: April 12, 2019
·
2.539K
· naholyr

Dealing with process hierarchy

Take this simple dumb bash script that just launches two sub-processes (one in background, the other waiting):

#!/bin/bash
sleep 40 & sleep 41

Run it, here is the process tree:

  PID  PGID STAT CMD
 8709  8709 Ss   /bin/zsh
14431 14431 S+    \_ sh ./runsubprocess
14432 14431 S+        \_ sleep 40
14433 14431 S+        \_ sleep 41

Then kill it with kill 14431, here is the process tree:

  PID  PGID STAT CMD
14432 14431 S    sleep 40
14433 14431 S    sleep 41

We left two running processes :(

Solution: use pkill to kill the children too!

Kill it with pkill -P 14431 and now you have no more orphans.

Picture

pkill will kill process and its children, not the grandchildren!

Imagine you have this script:

#!/bin/bash
(sleep 40 & sleep 41) & sleep 42

Process tree:

  PID  PGID STAT CMD
 8709  8709 Ss   /bin/zsh
14431 14431 S+    \_ sh ./runsubprocess
14432 14431 S+        \_ sleep 41
14434 14431 S+        |   \_ sleep 40
14433 14431 S+        \_ sleep 42

Using pkill -P 14431 you still let this process behind:

  PID  PGID STAT CMD
14434 14431 S    sleep 40

Fuck it! But here comes the PGID (Process Group Id). Notice how every process originally launched by the same ancestor has the same PGID (here 14431). This is how it works, it defines the subprocesses and its original root. The root process usually has PID = PGID. And, oh, kill can kill by PGID if you prefix with a dash: kill -TERM -$PGID.

Using kill -TERM -14431 is the best and easiest way to achieve what we were looking for.

Additional notes:
→ I used ps fo pid,pgid,stat,cmd to show process trees they way they're pasted here :)
→ Processes keep their original PGID, even if ancestor has died. You can then still kill by PGID if you forgot some orphans.
→ I suppose you may face exceptions, so if root process's PID ≠ PGID, you can still grab PGID from PID: echo $PID |xargs echo | xargs -i ps -o pgid -p {}|xargs echo |awk '{print $2}'

1 Response
Add your response

@annavester Nope, this is even worse than -15 (TERM) as you don't let the process any chance to catch and handle signal so it could kill its subprocesses.
I just tested just to be sure, but nope, it won't work ;)

over 1 year ago ·