简体   繁体   中英

How do I grep multiple possible extensions recursively

This question is different from other grep pattern matching questions because we're looking for a large number of file extensions, and thus the following from this question will be too long and tedious to type: grep -r -i --include '*.ade' --include '*.adp' ... CP_Image ~/path[12345]

I was trying to email the backup of a static site when Google blocked my attachment upload for security reasons. Their support page says :

You can't send or receive the following file types:

.ade, .adp, .bat, .chm, .cmd, .com, .cpl, .exe, .hta, .ins, .isp, .jar, .jse, .lib, .lnk, .mde, .msc, .msp, .mst, .pif, .scr, .sct, .shb, .sys, .vb, .vbe, .vbs, .vxd, .wsc, .wsf, .wsh

I converted and tested the following Regular Expression here :

/.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)/gi

And tried running it with:

ls -lahR | grep '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

It doesn't work. I don't think grep interprets the and ( | ) symbol properly because ls -lahR | grep '.*\\.html' ls -lahR | grep '.*\\.html' works

Normal grep uses Basic Regular Expressions (BRE). In BRE, capturing groups are represented by \\(...\\) and the alternation op is referred by \\|

grep '.*\.\(ade\|adp\|bat\|chm\|cmd\|com\|cpl\|exe\|hta\|ins\|isp\|jar\|jse\|lib\|lnk\|mde\|msc\|msp\|mst\|pif\|scr\|sct\|shb\|sys\|vb\|vbe\|vbs\|vxd\|wsc\|wsf\|wsh\)'

OR

grep -E '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|ms‌​t|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

Use --extended-regex by enabling the -E parameter.

Reference

Add the flag -E to indicate it's an extended regular expression. From GNU Grep 2.1 : The default is "basic regular expression", and

[i]n basic regular expressions the meta-characters '?', '+', '{', '|', '(', and ')' lose their special meaning.

I'm recursively trying to find files with the specified extensions.

Better to use find with -iregex option:

find . -regextype posix-egrep -iregex '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

On OSX use:

find -E . posix-egrep -iregex '.*\.(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)'

A bash method to exclude the given extensions: use extended globbing

shopt -s extglob nullglob
ls *.!(ade|adp|bat|chm|cmd|com|cpl|exe|hta|ins|isp|jar|jse|lib|lnk|mde|msc|msp|mst|pif|scr|sct|shb|sys|vb|vbe|vbs|vxd|wsc|wsf|wsh)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM