简体   繁体   中英

How to remove lines containing any matching text in bash

I have a text file. It looks like this:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 2500: $103.17 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 5000: $170.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 250: $42.25 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 500: $44.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 1000: $54.08 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 2500: $79.33 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 5000: $144.33 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 500: $159.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 1000: $176.17 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 2500: $297.58 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 5000: $522.72 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 250: $138.70 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 500: $164.50 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 1000: $181.13 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 2500: $302.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 5000: $515.63 :

So I have Business Cards , and I have Door Hanger s. Each one is an item, but to count them I need to remove every other occurance of them.

So in the end, the file would like this:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :

I have to do this without specifying exact names, that is I can't run sed specifically on occurrence of Business Card or Door Hanger . I just need to remove all lines containing ANY similarities, not just exact duplicates.

Thanks

With awk you can do that:

awk -F":" '$1!=k{print $0}{k=$1}' file.txt

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :

Where you test whether the first field is equal to the one from the previous line or not. If it's equal do nothing, just save it ( k=$1 ), if it's not, print the line.

This can be shortened to:

awk -F: '!seen[$1]++' file.txt

(Thx to JID and glenn jackman)

Alternatively, if you had fix number of column you could have done:

rev file.txt | uniq -f 17 | rev

where you reverse each line of your file and skip the 17 first column to apply uniq on the last one (in fact, the first ones), and reverse back. But here it's not very convenient as you don't have the same number of columns.

HTH

根据您的评论,执行此操作的简单方法是:

cat filename | awk -F ":" '{print $1}' | sort | uniq

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM