简体   繁体   中英

Count number of occurences using awk on large files

I have 2 files. First has few records which are existing in file2 also. File2 is bigger than file1. I want to show number of occurances of words (from file1) at file2.

Here is what I tried

file1.txt
bash-3.00# cat file1.txt |wc -l
17102666

more file1.txt
123advertise3
123advertise4
123advertise5
123advertiseb
123advertisec
123advertised
123advertisedebtconsolidation
123advertisee
123advertisef
123advertiseg
123advertiseh
123advertisehomaxproducts

File2

file2.txt
bash-3.00#cat file2.txt | wc -l
113842500


more file2.txt
123123apartment
123123attorney
123123auction
123123auto
123advertisedebtconsolidation
123advertiseb
123123automate
123123automatic
123123bank
123advertisedebtconsolidation
123advertiseb
123123banking
123123bankruptcy
123advertisedebtconsolidation
123123bargain
123123best
123123blog
123advertisedebtconsolidation
123123building

I wanted output like this

123advertisedebtconsolidation 3
123advertiseb 2

I ran below command

bash-3.00# nawk 'FNR==NR{c[$1];next}$1 in c{++c[$1]}END{for(i in c) print i,c[i]}' file1.txt file2.txt

But I didnt get desired output.

I got something like strings only

peaktablethomecsuchico
browsepropertyhomebase
clickflowershomedsn
worldwideflowerstravelagency
acepigb
acepigc
browsecompanytravelagent
liveearnhomedownpaymentassistance
acepigd
bargainsystemhomebvcure
acepige
acepigf
uniquecasinohomecycling
alternativeanyhomecanningrecipes
acepigj
annualsurveyhomedma

Can anybody help me in getting such output using grep or awk in larger files specially. I tried same thing on smaller files and it worked fine.

The uniq command may be a simple way to output common strings. The -c option specifies the number of matches in the input. The awk command simply outputs only the lines with more than one occurence.

cat file1.txt file2.txt | sort | uniq -c | awk '{ if ($1 > 1) print $0; }'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM