[英]How can I getting word count for all PDF Files under a folder in Mac OS or Windows 10
I know some way to getting word count for a single PDF Files, but I have a folder which contains 500+ PDF files so I would like to know if there is a faster way to get the word count for all of them without opening every single file and do the copy past stuff like that.我知道一些方法来获取单个 PDF 文件的字数,但是我有一个文件夹,其中包含 500+ PDF 文件,所以我想知道是否有更快的方法来获取所有文件的字数,而无需打开每个文件文件并复制过去的东西。
I'm using macOS Catalina 10.15.5, If there is a solution for Windows 10 that also fine for me.我正在使用 macOS Catalina 10.15.5,如果有 Windows 10 的解决方案对我来说也很好。
I just launched following command on my Windows machine:我刚刚在我的 Windows 机器上启动了以下命令:
Prompt>dir *.txt /S
There was an enormous output, and at the end, there was:有一个巨大的output,最后是:
Total Files Listed:
3620 File(s) 93.074.638 bytes
0 Dir(s) 410.585.006.080 bytes free
Edit after first comment在第一条评论后编辑
PDF is a format, which is made to be human-readable, not computer-readable, so doing some parsing and making some calculations on it, just using some simple computer commands, I don't believe it is even possible. PDF 是一种格式,它被制成人类可读的,而不是计算机可读的,所以对其进行一些解析和计算,只是使用一些简单的计算机命令,我不相信它是可能的。
You can use pdfgrep
which you can install with homebrew using:您可以使用pdfgrep
,您可以使用homebrew安装它:
brew install pdfgrep
Then your command to count the words in all the files will be:然后,您计算所有文件中单词的命令将是:
pdfgrep -c -P "\b.*\b" *.pdf
Sample Output样品 Output
Arduino Wireless Communication With the HC-12.pdf:512
sample.pdf:0
simple.pdf:4
text.pdf:22
The -P
means to use PCRE
, or "Perl Compatible Regular Expressions" wherein \b
signifies a word boundary - ie the start or end of a word. -P
表示使用PCRE
或“Perl 兼容正则表达式” ,其中\b
表示单词边界 - 即单词的开头或结尾。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.