简体   繁体   中英

How to use grep or awk to process a specific column ( with keywords from text file )

I've tried many combinations of grep and awk commands to process text from file.

This is a list of customers of this type:

John,Mills,81,Crescent,New York,NY,john@mills.com,19/02/1954

I am trying to separate these records into two categories, MEN and FEMALES.

I have a list of some 5000 Female Names , all in plain text , all in one file.

How can I "grep" the first column ( since I am only matching first names) but still printing the entire customer record ?

I found it easy to "cut" the first column and grep --file=female.names.txt , but this way it's not going to print the entire record any longer.

I am aware of the awk option but in that case I don't know how to read the female names from file.

awk -F ',' ' { if($1==" ???Filename??? ") print $0} '

Many thanks !

You can do this with Awk:

awk -F, 'NR==FNR{a[$0]; next} ($1 in a)' female.names.txt file.csv 

Would print the lines of your csv file that contain first names of any found in your file female.names.txt .

awk -F, 'NR==FNR{a[$0]; next} !($1 in a)' female.names.txt file.csv 

Would output lines not found in female.names.txt .

This assumes the format of your female.names.txt file is something like:

Heather
Irene
Jane

Another alternative is Perl, which can be useful if you're not super-familiar with awk.

#!/usr/bin/perl -anF,
use strict;
our %names;

BEGIN {
    while (<ARGV>) {
        chomp;
        $names{$_} = 1;
    }
}

print if $names{$F[0]};

To run (assume you named this file filter.pl ):

perl filter.pl female.names.txt < records.txt

Try this:

grep --file=<(sed 's/.*/^&,/' female.names.txt) datafile.csv

This changes all the names in the list of female names to the regular expression ^name, so it only matches at the beginning of the line and followed by a comma. Then it uses process substitution to use that as the file to match against the data file.

So, I've come up with the following:

Suppose, you have a file having the following lines in a file named test.txt :

abe 123 bdb 532

xyz 593 iau 591

Now you want to find the lines which include the first field having the first and last letters as vowels. If you did a simple grep you would get both of the lines but the following will give you the first line only which is the desired output:

egrep "^([0-z]{1,} ){0}[aeiou][0-z]+[aeiou]" test.txt

Then you want to the find the lines which include the third field having the first and last letters as vowels. Similary, if you did a simple grep you would get both of the lines but the following will give you the second line only which is the desired output:

egrep "^([0-z]{1,} ){2}[aeiou][0-z]+[aeiou]" test.txt

The value in the first curly braces {1,} specifies that the preceding character which ranges from 0 to z according to the ASCII table, can occur any number of times. After that, we have the field separator space in this case . Change the value within the second curly braces {0} or {2} to the desired field number-1 . Then, use a regular expression to mention your criteria.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM