Thursday, June 24, 2010

Efficient command line file searching...

I recently replied to the blog post: One-liner: Finding files that include a match with this response.

The problem with this method is that it's going to be VERY slow and will be especially painful when examining large numbers of files. It's much more efficient to use find in conjunction with xargs.

Here is a safe and fast way to search through a bunch of files...

find /path/to/something -type f -iname '*some_pattern*' -print0 | xargs -0 grep -H '^option_name'

If you just want the filename, change the grep option -H to -l. This is handy for subsearches. So you can search for all files containing the specified pattern, then on those files, print the lines containing another pattern. 

find /path -type f -iname '*pattern*' -print0 | xargs -0 grep -lZ '^option_name' | xargs -0 grep -H 'another pattern'

If you want to stick with perl to handle your matching, no problem:

find /path -type f -iname '*pattern*' -print0 | xargs -0 perl -wnl -e '/^option_name/ and print "$_\n"'

Keep in mind that using grep/egrep is normally MUCH faster than using perl (though I've noticed great improvements in 5.10+). If you want to stick with perl patterns, pcregrep is slightly faster than using perl directly. But for complicated patterns, I've found pcregrep to be quite a bit faster than grep/egrep.

I use this method fairly regularly to search millions of files at a time.

No comments:

Post a Comment