Bash grep regex问题有两个不同的文件

Question

I have the following command which is filtering 3-letters words from a file made of upper case words only - one word per line: 我有以下命令，该命令仅从大写单词组成的文件中过滤3个字母的单词-每行一个单词：

grep -E '^[A-Z]{3}$' test

The command returns a correct list of words when used with a file test containing 10 words. 与包含10个单词的文件test一起使用时，该命令返回正确的单词列表。 When applied to a much bigger file dico.txt containing over 30,000 words, the command does not return anything (a new prompt is simply displayed). 当应用于包含30,000个单词的更大的文件dico.txt ，该命令将不返回任何内容（仅显示新的提示）。

As I thought it might be either an extension or a file size issue, I've tried: 我以为可能是扩展名或文件大小问题，所以我尝试了：

cp test test.txt to match the big file *.txt extension cp test test.txt以匹配大文件*.txt扩展名
Create a new file dico_small.txt selecting 1000 lines from dico.txt 创建一个新文件dico_small.txt从dico.txt选择1000行

...both without success. 都没有成功

Answer 1

Your large file has windows line endings, that is \\r\\n instead of linux line endings \\n . 您的大文件具有Windows行尾，即\\r\\n而不是Linux行尾\\n 。

\\r is called carriage return and is treated as a normal character by grep . \\r称为回车， grep将其视为普通字符。 When you write grep -E "a$" fileWithWindowsLineEndings then grep won't find anything because in front of the linux line ending \\n (denoted as $ in grep) there is always a \\r and never an a . 当您编写grep -E "a$" fileWithWindowsLineEndings grep将找不到任何内容，因为在以\\n结尾的Linux行（在grep中表示为$ ）的前面总是有一个\\r而不是a 。

You can convert your file to a normal linux file by deleting all \\r characters. 您可以通过删除所有\\r字符将文件转换为普通的linux文件。

tr -d '\r' < fileWithWindowsLineEndings > fileWithLinuxLineEndings
grep -E '...' fileWithLinuxLineEndings

Alternatively, convert the file on the fly without saving the conversion result: 或者，在不保存转换结果的情况下即时转换文件：

tr -d '\r' < fileWithWindowsLineEndings | grep -E '...'

Bash grep regex问题有两个不同的文件

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-03-19 16:04:36

Bash grep regex问题有两个不同的文件

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-03-19 16:04:36

解决方案1
3 已采纳 2018-03-19 16:04:36