[英]How do I filter out lines of a text file that have length of 8 and ends in .com?
I have a list of a million domain names in name.txt
我在name.txt
列出了一百万个域名
hello.com
abc.com
gogogo.us
goodbye.me
...
...
How do I pipe only domain names with 8 letters (including the .com
) and only ends in .com
to names_new.txt
? 如何仅将8个字母(包括.com
)且仅以.com
结尾的域名通过管道传递给names_new.txt
?
I'm looking for a simple command and not a script or anything. 我在寻找一个简单的命令,而不是脚本或其他任何东西。
grep
是第一个用于模式匹配的工具:
egrep -x '[a-z]{4}\.com' name.txt > newname.txt
尝试
egrep "^[a-z][a-z][a-z][a-z]\.com$" name.txt > names_new.txt
Use Awk. 使用Awk。 The domain name is split by .
域名被分割.
into fields. 进入领域。
First field is tested for length 4,as the .com
adds another 4 chars. 第一个字段的长度为4,因为.com
添加了另外4个字符。
The second field should contain com
. 第二个字段应包含com
。
When both conditions are met, the line is printed. 当两个条件都满足时,将打印该行。
cat name.txt |awk -F. '((length($1)==4)&&($2=="com")){print;}' > names_new.txt
Note: the line may found false positives if you have subdomains, eg: mail.com.nz
注意:如果您有子域,则该行可能会发现误报,例如: mail.com.nz
There may be domain names with dashes or numbers. 域名可能带有破折号或数字。
-i forces egrep to match regardless of case. -i强制egrep匹配(无论大小写)。
egrep -i "^[a-z0-9-]{4}\.com$" name.txt > names_new.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.