I have a text file. It looks like this:
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 2500: $103.17 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 5000: $170.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 250: $42.25 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 500: $44.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 1000: $54.08 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 2500: $79.33 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 5000: $144.33 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 500: $159.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 1000: $176.17 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 2500: $297.58 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 5000: $522.72 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 250: $138.70 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 500: $164.50 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 1000: $181.13 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 2500: $302.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 5000: $515.63 :
So I have Business Cards
, and I have Door Hanger
s. Each one is an item, but to count them I need to remove every other occurance of them.
So in the end, the file would like this:
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :
I have to do this without specifying exact names, that is I can't run sed
specifically on occurrence of Business Card
or Door Hanger
. I just need to remove all lines containing ANY similarities, not just exact duplicates.
Thanks
With awk you can do that:
awk -F":" '$1!=k{print $0}{k=$1}' file.txt
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: $67.18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: $136.23 :
Where you test whether the first field is equal to the one from the previous line or not. If it's equal do nothing, just save it ( k=$1
), if it's not, print the line.
This can be shortened to:
awk -F: '!seen[$1]++' file.txt
(Thx to JID and glenn jackman)
Alternatively, if you had fix number of column you could have done:
rev file.txt | uniq -f 17 | rev
where you reverse each line of your file and skip the 17 first column to apply uniq on the last one (in fact, the first ones), and reverse back. But here it's not very convenient as you don't have the same number of columns.
HTH
根据您的评论,执行此操作的简单方法是:
cat filename | awk -F ":" '{print $1}' | sort | uniq
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.