简体   繁体   中英

Specifying a text qualifier and delimiter in Linux

how to specify text qualifier with awk or another linux program?

my data looks like this:

在此处输入图片说明

It's actually tab delimitted, but some fields have a tab inside of them. the fields are qualified by double quotes.

how do i specify that the fields are not just delimitted by tabs but also the fields are separated by quotes?

here's my script so far:

 awk '{OF=OFS="\t"}{print $1,$7,$8,$10,$11,$21}' cyme.txt | grep -i pilates

also, for practical purposes, i'm including a perfect text copy of a data sample:

"723721093013"  "AFL"   "1" ""  "15"    "ALT ROCK...."  "Hai!........................"  "Creatures, The.............."  2   "N" 4   7.48    2004.02.17  0.0000  .  .    .  .    2
"723721093112"  "AFL"   "1" ""  "5" "ELECTRONIC.."  "Crash And Burn.............."  "Foxx, John/Gordon, Louis...."  1   "W" 4   11.98   2004.02.17  0.0000  .  .    .  .    73
"819162013137"  "AHY"   "1" ""  "101"   "PUNK........"  "Truth, Love and Liberty....."  "FM359......................."  2   "H" 1   4.48    2014.01.14  0.0000  .  .    .  .    39
"879198005148"  "AHY"   "1" ""  "14"    "PUNK........"  "Re-Volts S/T................"  "Re-Volts, The..............."  1   "J" 4   5.48    2007.12.11  0.0000  .  .    .  .    10
"879198004288"  "AHY"   "1" ""  "24"    "PUNK........"  "Read Between The Lines......"  "Smalltown..................."  1   "N" 4   7.48    2009.12.01  0.0000  .  .    .  .    17

Please let me know if anything needs clarification. how to specify text qualifier with awk or another linux program?

I'm realizing that surprisingly awk might not be the right tool for this job, and if that is indeed the case, I'm happy to learn about other commands to should be used to process a text file with field qualifiers.

If gawk avaliable, use a regex as field separator:

> gawk '{for (i=1;i<=NF;i++){if ($i){printf("FN: %d Content: %s",i,$i)}}print "\n"}' FS='([\t]*?\"| +)' infile
FN: 2 Content: 723721093013FN: 5 Content: AFLFN: 8 Content: 1FN: 14 Content: 15FN: 17 Content: ALTFN: 18 Content: ROCK....FN: 21 Content: Hai!........................FN: 24 Content: Creatures,FN: 25 Content: The..............FN: 27 Content: 2FN: 29 Content: NFN: 31 Content: 4FN: 32 Content: 7.48FN: 33 Content: 2004.02.17FN: 34 Content: 0.0000FN: 35 Content: .FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: 2

FN: 2 Content: 723721093112FN: 5 Content: AFLFN: 8 Content: 1FN: 14 Content: 5FN: 17 Content: ELECTRONIC..FN: 20 Content: CrashFN: 21 Content: AndFN: 22 Content: Burn..............FN: 25 Content: Foxx,FN: 26 Content: John/Gordon,FN: 27 Content: Louis....FN: 29 Content: 1FN: 31 Content: WFN: 33 Content: 4FN: 34 Content: 11.98FN: 35 Content: 2004.02.17FN: 36 Content: 0.0000FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: .FN: 41 Content: 73

FN: 2 Content: 819162013137FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 101FN: 17 Content: PUNK........FN: 20 Content: Truth,FN: 21 Content: LoveFN: 22 Content: andFN: 23 Content: Liberty.....FN: 26 Content: FM359.......................FN: 28 Content: 2FN: 30 Content: HFN: 32 Content: 1FN: 33 Content: 4.48FN: 34 Content: 2014.01.14FN: 35 Content: 0.0000FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: 39

FN: 2 Content: 879198005148FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 14FN: 17 Content: PUNK........FN: 20 Content: Re-VoltsFN: 21 Content: S/T................FN: 24 Content: Re-Volts,FN: 25 Content: The...............FN: 27 Content: 1FN: 29 Content: JFN: 31 Content: 4FN: 32 Content: 5.48FN: 33 Content: 2007.12.11FN: 34 Content: 0.0000FN: 35 Content: .FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: 10

FN: 2 Content: 879198004288FN: 5 Content: AHYFN: 8 Content: 1FN: 14 Content: 24FN: 17 Content: PUNK........FN: 20 Content: ReadFN: 21 Content: BetweenFN: 22 Content: TheFN: 23 Content: Lines......FN: 26 Content: Smalltown...................FN: 28 Content: 1FN: 30 Content: NFN: 32 Content: 4FN: 33 Content: 7.48FN: 34 Content: 2009.12.01FN: 35 Content: 0.0000FN: 36 Content: .FN: 37 Content: .FN: 38 Content: .FN: 39 Content: .FN: 40 Content: 17

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM