简体   繁体   中英

Perl : How to extract unique entries of a text file

I am totally a beginner in Perl. I have a large file (around 100 G) which looks like this:

 domain, ip
 "www.google.ac.",173.194.33.111
 "www.google.ac.",173.194.33.119
 "www.google.ac.",173.194.33.120
 "www.google.ac.",173.194.33.127
 "www.google.ac.",173.194.33.143
 "apple.com., 173.194.33.143
 "studio.com.", 173.194.33.143
 "www.google.ac.",101.78.156.201
 "www.google.ac.",101.78.156.201

So basically I have 1-duplicate lines, 2- one ip with different domains, 3- one domain with different ips. and I would like to remove the duplicate lines from the file (the ones with same domain,ip pair).

**I have already reviewed other answers in regards to the same question, none of them address my problem with large files .

Does anybody have a clue how can I do it in PERL? or any suggestion for more optimal language?

The easiest thing to do is read the file a line at a time and use each line as the key of a hash. You have to have memory to store each unique line once, though. There's no getting around that.

Here's a one-liner as run from the shell:

perl -ne '$lines{$_}++; END { print keys %lines }' filename

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM