Last Updated: September 09, 2019
·
5.287K
· nofeet

How NOT to delete files with the find command

I wanted to delete some Python .pyc files. First, I looked for them. (I always want to see what I'm wiping out before I do it.) So--

find . -name *.pyc

It returned about a dozen pyc files. Now to delete.

find has a -delete option to delete whatever it finds. So I quickly threw that into the previous command:

# DON'T DO THIS
find . -delete -name *.pyc

Then I went to continue my work. But everything was gone. I had deleted the entire tree!

I was supposed to put -delete after -name:

# the right way to delete what it finds
find . -name *.pyc -delete

This was a painful reminder that find simply executes the expressions passed to it in left-to-right order while applying any given operators, like and/or/not. (See OPERATORS in man find.)

When I had the -delete right after the path, I was basically telling find to traverse the tree, delete each item as it encounters them, then match it with the given name (which it doesn't do because it was deleted).

8 Responses
Add your response

In order not to fuck up: find . -name '*.pyc' | xargs rm.

over 1 year ago ·

And don't use *.pyc without any type of quotes or find will throw error like:
unknown primary or operator or paths must precede expression (depends on system I think).

With quotes everything is ok: find . -name "*.pyc" -delete

over 1 year ago ·

Thanks for the great comments, all.

I've been thinking more about this. The UNIX philosophy is that a program should do only one thing and do it well. So upon further reflection, I think it's strange that the find command tries to do so much. It's great at finding stuff, but now I'm not so sure I should also be using it to delete stuff or execute other commands.

Therefore, I think I'm going to go back to using the find and xargs combo.

over 1 year ago ·

Wtf dude...

over 1 year ago ·

Calling find with -delete is probably faster, since you don't have to execute rm on each file, it just passes the filename directly to unlink(2).

However it's definitely not safer.

My advice here is to switch to using the xargs command. One of the nicer things is that it will build the longest possible rm command and then execute that rather than doing it individually (like say if you were to loop over the output of find in a for loop), which decreases the number of times you call fork(2) and execve(2).

Additionally, it's worth mentioning that @Kwpolska's solution is dangerous as well. If you have any files with spaces in the names, you'll get some unintended results. xargs (as well as rm) will treat those as separate arguments and they won't be split properly.

The way to get around that (provided you're using GNU xargs/find (you probably are)) is to add the -0 flag to xargs and -print0 to find. find will now delimit its output with NULL bytes, and xargs will use the NULL byte as the delimiter instead of any space character.

Your final command will look like:

find . -name \*.pyc -print0 | xargs -0 rm

Cheers

over 1 year ago ·

Thanks, hopefully I will never do the same mistake :) When deleting files using find, I usually do this:

find . -name "*.pyc" -exec rm -f {} \;
over 1 year ago ·

That sucks, but it does mention that potential problem in the man pages.

I use a (fish) function to avoid making the same mistake:

function find-and-delete
  find . -name $argv[1] -delete
end

E.g. find-and-delete '*.pyc'

over 1 year ago ·

Those using xargs, I suggest you read man xargs and look at the -P parameter:

-P max-procs

Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run
as many processes as possible at a time. Use the -n option with -P; otherwise chances are
that only one exec will be done.

This is cool, because that means you can spawn >1 instance of rm to do your calls, which should give a minor improvement if you manage to get the kernel batching the underlying rm calls.

over 1 year ago ·