[英]How can I find all of the distinct file extensions in a folder hierarchy?
On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.在 Linux 机器上,我想遍历文件夹层次结构并获取其中所有不同文件扩展名的列表。
What would be the best way to achieve this from a shell?从外壳实现这一目标的最佳方法是什么?
Try this (not sure if it's the best way, but it works):试试这个(不确定这是否是最好的方法,但它有效):
find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
It work as following:它的工作原理如下:
无需管道sort
,awk 可以做到这一切:
find . -type f | awk -F. '!a[$NF]++{print $NF}'
Recursive version:递归版本:
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -u
If you want totals (how may times the extension was seen):如果你想要总数(看到扩展的次数):
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rn
Non-recursive (single folder):非递归(单个文件夹):
for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -u
I've based this upon this forum post , credit should go there.我基于这个论坛帖子,信用应该去那里。
Powershell:电源外壳:
dir -recurse | select-object extension -unique
Thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html感谢http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html
My awk-less, sed-less, Perl-less, Python-less POSIX-compliant alternative:我的 awk-less、sed-less、Perl-less、Python-less POSIX 兼容替代方案:
find . -type f | rev | cut -d. -f1 | rev | tr '[:upper:]' '[:lower:]' | sort | uniq --count | sort -rn
The trick is that it reverses the line and cuts the extension at the beginning.诀窍在于它反转线并在开始时切断扩展。
It also converts the extensions to lower case.它还将扩展名转换为小写。
Example output:示例输出:
3689 jpg
1036 png
610 mp4
90 webm
90 mkv
57 mov
12 avi
10 txt
3 zip
2 ogv
1 xcf
1 trashinfo
1 sh
1 m4v
1 jpeg
1 ini
1 gqv
1 gcs
1 dv
Find everythin with a dot and show only the suffix.用点查找所有内容并仅显示后缀。
find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -u
if you know all suffix have 3 characters then如果您知道所有后缀都有 3 个字符,那么
find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -u
or with sed shows all suffixes with one to four characters.或 with sed 显示所有后缀一到四个字符。 Change {1,4} to the range of characters you are expecting in the suffix.将 {1,4} 更改为您期望后缀中的字符范围。
find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -u
Adding my own variation to the mix.将我自己的变化添加到组合中。 I think it's the simplest of the lot and can be useful when efficiency is not a big concern.我认为它是最简单的,当效率不是一个大问题时会很有用。
find . -type f | grep -oE '\.(\w+)$' | sort -u
I tried a bunch of the answers here, even the "best" answer.我在这里尝试了一堆答案,甚至是“最佳”答案。 They all came up short of what I specifically was after.他们都没有达到我特别追求的目标。 So besides the past 12 hours of sitting in regex code for multiple programs and reading and testing these answers this is what I came up with which works EXACTLY like I want.因此,除了过去 12 小时坐在多个程序的正则表达式代码中并阅读和测试这些答案之外,这就是我想出的,它完全像我想要的那样工作。
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort -u
If you need a count of the file extensions then use the below code如果您需要对文件扩展名进行计数,请使用以下代码
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{2,16}" | awk '{print tolower($0)}' | sort | uniq -c | sort -rn
While these methods will take some time to complete and probably aren't the best ways to go about the problem, they work.虽然这些方法需要一些时间才能完成,而且可能不是解决问题的最佳方法,但它们确实有效。
Update: Per @alpha_989 long file extensions will cause an issue.更新:每@alpha_989 长文件扩展名将导致问题。 That's due to the original regex "[[:alpha:]]{3,6}".这是由于原始的正则表达式“[[:alpha:]]{3,6}”。 I have updated the answer to include the regex "[[:alpha:]]{2,16}".我已经更新了答案以包含正则表达式“[[:alpha:]]{2,16}”。 However anyone using this code should be aware that those numbers are the min and max of how long the extension is allowed for the final output.但是,使用此代码的任何人都应该知道,这些数字是最终输出允许扩展的最小值和最大值。 Anything outside that range will be split into multiple lines in the output.该范围之外的任何内容都将在输出中分成多行。
Note: Original post did read "- Greps for file extensions between 3 and 6 characters (just adjust the numbers if they don't fit your need). This helps avoid cache files and system files (system file bit is to search jail)."注意:原始帖子确实读过“-Greps 用于 3 到 6 个字符之间的文件扩展名(如果它们不适合您的需要,只需调整数字)。这有助于避免缓存文件和系统文件(系统文件位是搜索jail)。 ”
Idea: Could be used to find file extensions over a specific length via:想法:可用于通过以下方式查找特定长度的文件扩展名:
find . -type f -name "*.*" | grep -o -E "\.[^\.]+$" | grep -o -E "[[:alpha:]]{4,}" | awk '{print tolower($0)}' | sort -u
Where 4 is the file extensions length to include and then find also any extensions beyond that length.其中 4 是要包含的文件扩展名长度,然后还查找超出该长度的任何扩展名。
In Python using generators for very large directories, including blank extensions, and getting the number of times each extension shows up:在 Python 中使用生成器生成非常大的目录,包括空白扩展名,并获取每个扩展名出现的次数:
import json
import collections
import itertools
import os
root = '/home/andres'
files = itertools.chain.from_iterable((
files for _,_,files in os.walk(root)
))
counter = collections.Counter(
(os.path.splitext(file_)[1] for file_ in files)
)
print json.dumps(counter, indent=2)
Since there's already another solution which uses Perl:由于已经有另一个使用 Perl 的解决方案:
If you have Python installed you could also do (from the shell):如果您安装了 Python,您还可以执行以下操作(从 shell):
python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)"
None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this).到目前为止,没有任何回复正确处理带有换行符的文件名(除了 ChristopheD,它在我输入时才出现)。 The following is not a shell one-liner, but works, and is reasonably fast.以下不是单行外壳,但有效,并且相当快。
import os, sys
def names(roots):
for root in roots:
for a, b, basenames in os.walk(root):
for basename in basenames:
yield basename
sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:]))
for suf in sufs:
if suf:
print suf
I think the most simple & straightforward way is我认为最简单直接的方法是
for f in *.*; do echo "${f##*.}"; done | sort -u
It's modified on ChristopheD's 3rd way.它是在 ChristopheD 的第三种方式上修改的。
我认为还没有提到这个:
find . -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
你也可以这样做
find . -type f -name "*.php" -exec PATHTOAPP {} +
I've found it simple and fast...我发现它既简单又快速...
# find . -type f -exec basename {} \; | awk -F"." '{print $NF}' > /tmp/outfile.txt
# cat /tmp/outfile.txt | sort | uniq -c| sort -n > tmp/outfile_sorted.txt
The accepted answer uses REGEX and you cannot create an alias command with REGEX, you have to put it into a shell script, I'm using Amazon Linux 2 and did the following:接受的答案使用 REGEX,您不能使用 REGEX 创建别名命令,您必须将其放入 shell 脚本中,我使用的是 Amazon Linux 2 并执行以下操作:
I put the accepted answer code into a file using :我使用以下方法将接受的答案代码放入文件中:
sudo vim find.sh须藤vim find.sh
add this code:添加此代码:
find ./ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u
save the file by typing: :wq!
输入以下命令保存文件: :wq!
sudo vim ~/.bash_profile
alias getext=". /path/to/your/find.sh"
:wq!
. ~/.bash_profile
Another way:其他方式:
find . -type f -name "*.*" -printf "%f\\n" | while IFS= read -r; do echo "${REPLY##*.}"; done | sort -u
You can drop the -name "*.*"
but this ensures we are dealing only with files that do have an extension of some sort.您可以删除-name "*.*"
但这确保我们只处理具有某种扩展名的文件。
The -printf
is find
's print, not bash. -printf
是find
的打印,而不是 bash。 -printf "%f\\n"
prints only the filename, stripping the path (and adds a newline). -printf "%f\\n"
仅打印文件名,去除路径(并添加换行符)。
Then we use string substitution to remove up to the last dot using ${REPLY##*.}
.然后我们使用字符串替换使用${REPLY##*.}
删除最后一个点。
Note that $REPLY
is simply read
's inbuilt variable.请注意, $REPLY
只是read
的内置变量。 We could just as use our own in the form: while IFS= read -r file
, and here $file would be the variable.我们可以使用我们自己的形式: while IFS= read -r file
,这里 $file 将是变量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.