简体   繁体   English

如何在linux终端上只打印txt文件?

[英]How to print only txt files on linux terminal?

On my Linux directory I have 6 files. 在我的Linux目录中,我有6个文件。 5 files are txt files and 1 file a .tar.gz type file. 5个文件是txt文件,1个文件是.tar.gz类型文件。 How can I print to the terminal only the name of the txt files? 如何只在终端上打印txt文件的名称?

directory :dir
content:
ex1, ex2, ex3, ex4, ex5, ex6.tar.gz

The command 'file', followed by the name of a file, will return the type of the file. 命令'file',后跟文件名,将返回文件的类型。

You can loop over the files in your directory, use each filename as input to the 'file' command, and if it is a text file, print that filename. 您可以遍历目录中的文件,使用每个文件名作为'file'命令的输入,如果是文本文件,则打印该文件名。

The following includes some extra output from the file command, which I'm not sure how to remove yet, but it does give you the filenames you want: 以下包括来自file命令的一些额外输出,我不知道如何删除,但它确实为您提供了所需的文件名:

#!/bin/bash
for f in *
do
  file $f | grep text
done

You can put this into a shell script in the directory you want to get the filenames from, and run it from the command line. 您可以将其放入要从中获取文件名的目录中的shell脚本,然后从命令行运行它。

Because you do not have a file extension (.txt) I would try to do it with exclusion. 因为你没有文件扩展名(.txt),我会尝试排除。

ls | ls | grep -v tar.gz grep -v tar.gz

If you have multiple types then use extensions. 如果您有多种类型,请使用扩展名。

Updated Answer 更新的答案

As @hek2mgl points out in the comments, a more robust solution is to separate filenames using nul characters (which may not occur in filenames) and that will deal with filenames containing newlines, and colons: 正如@ hek2mgl在评论中指出的,一个更强大的解决方案是使用nul字符(可能不会出现在文件名中)来分隔文件名,这将处理包含换行符和冒号的文件名:

file -0 * | awk -F'\0' '$2 ~ /text/{print $1}'

Original Answer 原始答案

I would do this: 我会这样做:

file * | awk -F: '$2~/text/{print $1}'

That runs file to see the type of each file and passes the names and types to awk separated by a colon. 运行file以查看每个文件的类型,并将名称和类型传递awk冒号分隔的awk awk then looks for the word text in the second field and if it finds it, prints the first field - which is the filename. awk然后在第二个字段中查找单词text ,如果找到它,则打印第一个字段 - 即文件名。

Try running the following simpler command on its own to see how it works: 尝试单独运行以下更简单的命令,看看它是如何工作的:

file *

The suggestions of using the file command are correct. 使用file命令的建议是正确的。 The problem here is parsing the output of this command, because (1) file names can contain pretty any character, and (2) the concrete output of the file command is a bit unpredictable, because it depends on how the so called magic files are present. 这里的问题是解析这个命令的输出,因为(1)文件名可以包含任何字符,(2) file命令的具体输出有点不可预测,因为它取决于所谓的魔术文件是如何当下。

If we rely on the fact that the explanation text of the output of the file command - ie that part which explains what file it is - always contains the word text if it is a text file, and that it never contains a colon, we can process it as follows: 如果我们依赖于file命令输出的说明文本 - 即解释它是什么文件的部分 - 如果它是文本文件总是包含单词text ,并且它从不包含冒号,我们可以处理如下:

The last colon in the output must separated the filename from the explanation. 输出中的最后一个冒号必须将文件名与说明分开。 Everything to the left is the filename, and if the word text (note the leading space before text !) occurs in the right part, we have a text file. 一切左侧是文件名,如果字text (注意文本之前前导空格!)在右边出现,我们有一个文本文件。

This still leaves us with those (hopefully rare) cases where a file name contains a non-printable character, they would be translated to their octal equivalent, which might or might not be what you want to see. 这仍然让我们看到那些(希望很少见)文件名包含不可打印字符的情况,它们将被翻译成它们的八进制等价物,这可能是也可能不是你想要看到的。 You can suppress this by passing the -r option to the file command. 您可以通过将-r选项传递给file命令来抑制此操作 This is useful if you want to process this filename further instead of just displaying it to the user, but it might corrupt your parsing logic, especially if the filename contains a newline. 如果您希望进一步处理此文件名而不是仅将其显示给用户,这很有用,但它可能会破坏您的解析逻辑,尤其是在文件名包含换行符的情况下。

Finally, don't forget that in any case, you see what the system considers a text file. 最后,不要忘记在任何情况下,您都会看到系统认为文本文件的内容。 This is not necessarily the same what you define to be a text file. 这不一定与定义为文本文件的内容相同。

Given this directory of files: 鉴于此文件目录:

$ file *
1.txt:      UTF-8 Unicode (with BOM) text, with CRLF line terminators
2.pdf:      PDF document, version 1.5
3.pdf:      PDF document, version 1.5
4.dat:      data
5.txt:      ASCII text
6.jpg:      JPEG image data, JFIF standard 1.02, aspect ratio, density 100x100, segment length 16, baseline, precision 8, 2833x972, frames 3
7.html:     HTML document text, UTF-8 Unicode text, with very long lines, with no line terminators
8.js:       UTF-8 Unicode text
9.xml:      XML 1.0 document text
A.pl:       a /opt/local/bin/perl script text executable, ASCII text
B.Makefile: makefile script text, ASCII text
C.c:        c program text, ASCII text
D.docx:     Microsoft Word 2007+

You can see the only files that are pure ascii are 5.txt, 9.xml, and AC. 您可以看到纯ascii的唯一文件是5.txt,9.xml和AC。 The rest are either binary or UTF according to file . 其余的是根据file二进制或UTF。

You can use a Bash glob to loop through files and use file to test each file. 您可以使用Bash glob循环遍历文件并使用file来测试每个文件。 This save having to parse the output of file for the file names but relies on file to accurate identify what you consider to be 'text': 此保存必须解析file的文件输出,但依赖于file来准确识别您认为是“文本”的内容:

for fn in *; do 
    [ -f "$fn" ] || continue
    fo=$(file "$fn")
    [[ $fo =~ ^"$fn":.*text ]] || continue
    echo "$fn"
done    

If you cannot use file , which is certainly the easiest way, you can open the file and look for binary characters. 如果您不能使用file ,这当然是最简单的方法,您可以打开文件并查找二进制字符。 Use Perl for that: 使用Perl:

for fn in *; do 
    [ -f "$fn" ] || continue
    head -c 2000 "$fn" | perl -lne '$tot+=length; $cnt+=s/[^[:ascii:]]//g; END{exit 1 if($cnt/$tot>0.03);}'
    [ $? -eq 0 ] || continue
    echo "$fn"
done    

In this case, I am looking for a percentage of ascii vs non ascii in the first 2000 bytes of a file. 在这种情况下,我在文件的前2000个字节中寻找ascii与非ascii的百分比。 YMMV but that allows finding a file that file would report as UTF (since it has a binary BOM) but most of the file is ascii. YMMV,但允许查找文件,该file将报告为UTF(因为它有二进制BOM),但大部分文件是ascii。

For that directory, the two Bash scripts report (with my comments on each file): 对于该目录,两个Bash脚本报告(我对每个文件的评论):

1.txt     # UTF file with a binary BOM but no UTF characters -- all ascii
4.dat     # text based configuration file for a router. file does not report this 
5.txt     # Pure ascii file
7.html    # html file
8.js      # Javascript sourcecode  
9.xml     # xml file all text
A.pl      # Perl file
B.Makefile   # Unix make file
C.c       # C source file

Since file does not consider the all ascii file 4.dat to be text, it is not reported by the first Bash script but is by the second. 由于file不将所有ascii文件4.dat视为文本,因此第一个Bash脚本不报告它,而是由第二个报告。 Otherwise -- same output. 否则 - 相同的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM