简体   繁体   中英

Delete every other row in CSV file using AWK or grep

I have a file like this:

1000_Tv178.tif,34.88552709  
1000_Tv178.tif,  
1000_Tv178.tif,34.66987165  
1000_Tv178.tif,  
1001_Tv180.tif,65.51335742  
1001_Tv180.tif,  
1002_Tv184.tif,33.83784863  
1002_Tv184.tif,  
1002_Tv184.tif,22.82542442  
1002_Tv184.tif,  

How can I make it like this using a simple Bash command? :

1000_Tv178.tif,34.88552709    
1000_Tv178.tif,34.66987165    
1001_Tv180.tif,65.51335742  
1002_Tv184.tif,33.83784863   
1002_Tv184.tif,22.82542442

Im other words, I need to delete every other row, starting with the second.

Thanks!

hek2mgl's (deleted) answer was on the right track, given the output you actually desire.

awk -F, '$2'

This says, print every row where the second field has a value.

If the second field has a value, but is nothing but whitespace you want to exclude, try this:

awk -F, '$2~/.*[^[:space:]].*/'`

You could also do this with sed:

sed '/,$/d'

Which says, delete every line that ends with a comma. I'm sure there's a better way, I avoid sed .

If you really want to explicitly delete every other row:

awk 'NR%2'

This says, print every row where the row number modulo 2 is not zero. If you really want to delete every even row it doesn't actually matter that it's a comma-delimited file.

provides a simple way

awk 'NR % 2' file.txt

This might work for you (GNU sed):

sed '2~2d' file

or:

sed 'n;d' file

Here's the gnu sed equivalent of the awk answers provided. Now you can safely use sed 's -i flag, by specifying a backup extension:

sed -n -i.bak 'N;P' file.txt

Note that gawk4 can do this too:

gawk -i inplace -v INPLACE_SUFFIX=".bak" 'NR%2==1' file.txt

Results:

1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442

If OPs input does not contain space after last number or , this awk can be used.

awk '!/,$/'
1000_Tv178.tif,34.88552709
1000_Tv178.tif,34.66987165
1001_Tv180.tif,65.51335742
1002_Tv184.tif,33.83784863
1002_Tv184.tif,22.82542442

But its not robust at all, any space after , brakes it. This should fix the last space:

awk '!/,[ ]*$/'

Thank for your help guys, but I also had to make a workaround: Read it into R and then wrote it out again. Then I installed GNU versions of awk and used gawk '{if ((FNR % 2) != 0) {print $0}}'. So if anyone else have the same problem, try it!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM