How NOT to delete files with the find command
I wanted to delete some Python .pyc files. First, I looked for them. (I always want to see what I'm wiping out before I do it.) So--
find . -name *.pyc
It returned about a dozen pyc files. Now to delete.
find
has a -delete
option to delete whatever it finds. So I quickly threw that into the previous command:
# DON'T DO THIS
find . -delete -name *.pyc
Then I went to continue my work. But everything was gone. I had deleted the entire tree!
I was supposed to put -delete
after -name
:
# the right way to delete what it finds
find . -name *.pyc -delete
This was a painful reminder that find
simply executes the expressions passed to it in left-to-right order while applying any given operators, like and/or/not. (See OPERATORS
in man find
.)
When I had the -delete
right after the path, I was basically telling find
to traverse the tree, delete each item as it encounters them, then match it with the given name (which it doesn't do because it was deleted).
Written by Greg Nofi
Related protips
8 Responses
In order not to fuck up: find . -name '*.pyc' | xargs rm
.
And don't use *.pyc
without any type of quotes or find will throw error like:
unknown primary or operator
or paths must precede expression
(depends on system I think).
With quotes everything is ok: find . -name "*.pyc" -delete
Thanks for the great comments, all.
I've been thinking more about this. The UNIX philosophy is that a program should do only one thing and do it well. So upon further reflection, I think it's strange that the find
command tries to do so much. It's great at finding stuff, but now I'm not so sure I should also be using it to delete stuff or execute other commands.
Therefore, I think I'm going to go back to using the find
and xargs
combo.
Wtf dude...
Calling find
with -delete is probably faster, since you don't have to execute rm
on each file, it just passes the filename directly to unlink(2)
.
However it's definitely not safer.
My advice here is to switch to using the xargs
command. One of the nicer things is that it will build the longest possible rm
command and then execute that rather than doing it individually (like say if you were to loop over the output of find in a for loop), which decreases the number of times you call fork(2)
and execve(2)
.
Additionally, it's worth mentioning that @Kwpolska's solution is dangerous as well. If you have any files with spaces in the names, you'll get some unintended results. xargs
(as well as rm
) will treat those as separate arguments and they won't be split properly.
The way to get around that (provided you're using GNU xargs
/find
(you probably are)) is to add the -0 flag to xargs
and -print0 to find. find will now delimit its output with NULL bytes, and xargs
will use the NULL byte as the delimiter instead of any space character.
Your final command will look like:
find . -name \*.pyc -print0 | xargs -0 rm
Cheers
Thanks, hopefully I will never do the same mistake :) When deleting files using find, I usually do this:
find . -name "*.pyc" -exec rm -f {} \;
That sucks, but it does mention that potential problem in the man pages.
I use a (fish) function to avoid making the same mistake:
function find-and-delete
find . -name $argv[1] -delete
end
E.g. find-and-delete '*.pyc'
Those using xargs
, I suggest you read man xargs
and look at the -P
parameter:
-P max-procs
Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run
as many processes as possible at a time. Use the -n option with -P; otherwise chances are
that only one exec will be done.
This is cool, because that means you can spawn >1 instance of rm
to do your calls, which should give a minor improvement if you manage to get the kernel batching the underlying rm
calls.