How to remove lines containing any matching text in bash

Question

I have a text file. It looks like this:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 2500: $103.17 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 5000: $170.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 250: $42.25 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 500: $44.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 1000: $54.08 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 2500: $79.33 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 5000: $144.33 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 500: $159.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 1000: $176.17 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 2500: $297.58 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 5000: $522.72 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 250: $138.70 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 500: $164.50 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 1000: $181.13 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 2500: $302.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 5000: $515.63 :

So I have Business Cards , and I have Door Hanger s. Each one is an item, but to count them I need to remove every other occurance of them.

So in the end, the file would like this:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :

I have to do this without specifying exact names, that is I can't run sed specifically on occurrence of Business Card or Door Hanger . I just need to remove all lines containing ANY similarities, not just exact duplicates.

Thanks

Answer 1

With awk you can do that:

awk -F":" '$1!=k{print $0}{k=$1}' file.txt

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :

Where you test whether the first field is equal to the one from the previous line or not. If it's equal do nothing, just save it ( k=$1 ), if it's not, print the line.

This can be shortened to:

awk -F: '!seen[$1]++' file.txt

(Thx to JID and glenn jackman)

Alternatively, if you had fix number of column you could have done:

rev file.txt | uniq -f 17 | rev

where you reverse each line of your file and skip the 17 first column to apply uniq on the last one (in fact, the first ones), and reverse back. But here it's not very convenient as you don't have the same number of columns.

HTH

Answer 2

根据您的评论，执行此操作的简单方法是：

cat filename | awk -F ":" '{print $1}' | sort | uniq

How to remove lines containing any matching text in bash

Question

2 answers

solution1
1 ACCPTED 2015-03-27 07:55:00

solution2
0 2015-03-27 07:53:45

How to remove lines containing any matching text in bash

Question

2 answers

solution1 1 ACCPTED 2015-03-27 07:55:00

solution2 0 2015-03-27 07:53:45

solution1
1 ACCPTED 2015-03-27 07:55:00

solution2
0 2015-03-27 07:53:45