Perl : How to extract unique entries of a text file

Question

I am totally a beginner in Perl. I have a large file (around 100 G) which looks like this:

 domain, ip
 "www.google.ac.",173.194.33.111
 "www.google.ac.",173.194.33.119
 "www.google.ac.",173.194.33.120
 "www.google.ac.",173.194.33.127
 "www.google.ac.",173.194.33.143
 "apple.com., 173.194.33.143
 "studio.com.", 173.194.33.143
 "www.google.ac.",101.78.156.201
 "www.google.ac.",101.78.156.201

So basically I have 1-duplicate lines, 2- one ip with different domains, 3- one domain with different ips. and I would like to remove the duplicate lines from the file (the ones with same domain,ip pair).

**I have already reviewed other answers in regards to the same question, none of them address my problem with large files .

Does anybody have a clue how can I do it in PERL? or any suggestion for more optimal language?

Answer 1

The easiest thing to do is read the file a line at a time and use each line as the key of a hash. You have to have memory to store each unique line once, though. There's no getting around that.

Here's a one-liner as run from the shell:

perl -ne '$lines{$_}++; END { print keys %lines }' filename

Perl : How to extract unique entries of a text file

Question

1 answers

solution1
0 2014-12-08 20:20:26

Perl : How to extract unique entries of a text file

Question

1 answers

solution1 0 2014-12-08 20:20:26

solution1
0 2014-12-08 20:20:26