简体   繁体   English

使用awk计算大文件的出现次数

[英]Count number of occurences using awk on large files

I have 2 files. 我有2个档案。 First has few records which are existing in file2 also. 首先有几个记录也存在于file2中。 File2 is bigger than file1. File2大于file1。 I want to show number of occurances of words (from file1) at file2. 我想显示在file2上单词出现的次数(来自file1)。

Here is what I tried 这是我尝试过的

file1.txt
bash-3.00# cat file1.txt |wc -l
17102666

more file1.txt
123advertise3
123advertise4
123advertise5
123advertiseb
123advertisec
123advertised
123advertisedebtconsolidation
123advertisee
123advertisef
123advertiseg
123advertiseh
123advertisehomaxproducts

File2 文件2

file2.txt
bash-3.00#cat file2.txt | wc -l
113842500


more file2.txt
123123apartment
123123attorney
123123auction
123123auto
123advertisedebtconsolidation
123advertiseb
123123automate
123123automatic
123123bank
123advertisedebtconsolidation
123advertiseb
123123banking
123123bankruptcy
123advertisedebtconsolidation
123123bargain
123123best
123123blog
123advertisedebtconsolidation
123123building

I wanted output like this 我想要这样的输出

123advertisedebtconsolidation 3
123advertiseb 2

I ran below command 我在命令下跑了

bash-3.00# nawk 'FNR==NR{c[$1];next}$1 in c{++c[$1]}END{for(i in c) print i,c[i]}' file1.txt file2.txt

But I didnt get desired output. 但是我没有得到想要的输出。

I got something like strings only 我只有弦乐之类的东西

peaktablethomecsuchico
browsepropertyhomebase
clickflowershomedsn
worldwideflowerstravelagency
acepigb
acepigc
browsecompanytravelagent
liveearnhomedownpaymentassistance
acepigd
bargainsystemhomebvcure
acepige
acepigf
uniquecasinohomecycling
alternativeanyhomecanningrecipes
acepigj
annualsurveyhomedma

Can anybody help me in getting such output using grep or awk in larger files specially. 有人可以帮助我使用grep或awk在较大的文件中获取此类输出吗? I tried same thing on smaller files and it worked fine. 我在较小的文件上尝试过同样的方法,但效果很好。

The uniq command may be a simple way to output common strings. uniq命令可能是输出公共字符串的简单方法。 The -c option specifies the number of matches in the input. -c选项指定输入中的匹配数。 The awk command simply outputs only the lines with more than one occurence. awk命令仅仅输出出现多次的行。

cat file1.txt file2.txt | sort | uniq -c | awk '{ if ($1 > 1) print $0; }'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM