简体   繁体   English

如何过滤出长度为8且以.com结尾的文本文件行?

[英]How do I filter out lines of a text file that have length of 8 and ends in .com?

I have a list of a million domain names in name.txt 我在name.txt列出了一百万个域名

hello.com
abc.com
gogogo.us
goodbye.me
...
...

How do I pipe only domain names with 8 letters (including the .com ) and only ends in .com to names_new.txt ? 如何仅将8个字母(包括.com )且仅以.com结尾的域名通过管道传递给names_new.txt

I'm looking for a simple command and not a script or anything. 我在寻找一个简单的命令,而不是脚本或其他任何东西。

grep是第一个用于模式匹配的工具:

egrep -x '[a-z]{4}\.com' name.txt > newname.txt

尝试

 egrep "^[a-z][a-z][a-z][a-z]\.com$" name.txt > names_new.txt

Use Awk. 使用Awk。 The domain name is split by . 域名被分割. into fields. 进入领域。

First field is tested for length 4,as the .com adds another 4 chars. 第一个字段的长度为4,因为.com添加了另外4个字符。

The second field should contain com . 第二个字段应包含com

When both conditions are met, the line is printed. 当两个条件都满足时,将打印该行。

cat name.txt |awk -F. '((length($1)==4)&&($2=="com")){print;}' > names_new.txt 

Note: the line may found false positives if you have subdomains, eg: mail.com.nz 注意:如果您有子域,则该行可能会发现误报,例如: mail.com.nz

There may be domain names with dashes or numbers. 域名可能带有破折号或数字。
-i forces egrep to match regardless of case. -i强制egrep匹配(无论大小写)。

egrep -i "^[a-z0-9-]{4}\.com$" name.txt > names_new.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何过滤掉文件中的所有唯一行? - How do you filter out all unique lines in a file? 如何在Linux中使用bash从文本文件中提取行? - How do I extract lines out of a text file using bash in linux? 如何将文件的行随机插入另一个文本文件? - How do I insert lines of a file randomly into another text file? 如何在 linux 的命令行上使用正则表达式过滤文本文件中以大写字母开头并以正 integer 结尾的行? - How do I filter lines in a text file that start with a capital letter and end with a positive integer with regex on the command line in linux? 如何在BASH中使用AWK和fprint解析文本文件? - How do I parse out a text file with AWK and fprint in BASH? 如何对多行文本文件重复使用tail命令? - How do I use tail command repeatedly for a text file with multiple lines? 在文本文件中,如何删除紧随其后的行的所有行子集? - In a text file, how do I delete all lines subset of their immediately following line? 如何从每个具有匹配文本的文件中删除行? - How do I remove lines from every file that has a matching text? 比较带有diff的文本文件时如何过滤/忽略特定行 - how to filter out / ignore specific lines when comparing text files with diff 如何grep出相同模式的多行? - How do I grep out multiple lines of the same pattern?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM