简体   繁体   中英

awk grouping and spliting multiple delimiters

I'm trying to split a line on multiple delimiters and group the output into individual elements that I can reorder. I'm on a BSD system running the pkg_info command. The output looks like this.

yaesu-0.13nb1       Control interface for Yaesu FT-890 HF transceiver   
skk-jisyo-cdb-201212 Dictionary collection for SKK  
dbskkd-cdb-2.00nb1  SKK dictionary server based on cdb
libchewing-0.2.7    The intelligent phonetic input method library  
skk-jisyo-201212    Dictionary collection for SKK  
autoconf-2.69nb2    Generates automatic source code configuration scripts  
pkg-config-0.28     System for managing library compile/link flags 
python27-2.7.5      Interpreted, interactive, object-oriented programming language

The package name always contains letters and numbers the version is the last entry attached to the name and the description is always separated by at least one white space. The most complex example is, "skk-jisyo-cdb" is the package name. "201212" is the version and "Dictionary collection for SKK" is the description.

I need to separate the version from the package name, leaving the package name intact with the "-" left in it while splitting the version info from there and making that an element of its own. Lastly I need to have the description remain in tact as a third element.

I think either awk or sed is capable of doing this but have yet to be able to group the elements correctly. Any help is much appreciated!

Here are a few of the thing I have tried so far:

pkg_info -a | awk -F'[[:space:]]*' '{print $1}' | awk -F- '{$NF=" "$NF;sub(/ /,"-")}1'

output:

yaesu- 0.13nb1
skk-jisyo cdb  201212
dbskkd-cdb  2.00nb1
libchewing- 0.2.7
skk-jisyo  201212
autoconf- 2.69nb2
pkg-config  0.28
python27- 2.7.5

And

pkg_info -a | awk 'BEGIN{FS="-| ";OFS="\t"}{print $1$2}'

output:

yaesu0.13nb1
skkjisyo
dbskkdcdb
libchewing0.2.7
skkjisyo
autoconf2.69nb2
pkgconfig
python272.7.5

I have been able to separate out the package name and version using 2 commands but this is not what I want/need. These are just for reference. This will get me the version by itself:

pkg_info -a | awk -F'[[:space:]]*' '{print $1}' | awk -F- '{print $NF }'

This will get me the package name by itself:

pkg_info -a | awk -F'[[:space:]]*' '{print $1}' | sed 's/\(.*\)\(-.*\)/\1/g'

What I need my final output to be is $pkgname\\t$version\\t$description\\n this would be seperated by a \\t Tab With the most complex example the output would be: skk-jisyo-cdb\\t201212\\tDictionary collection for SKK\\n

You didn't provide enough detail to be sure but this MAY be what you want:

$ sed -r 's/([^[:blank:]]+)-([^[:blank:]]+)[[:blank:]]+/\1\t\2\t/' file
yaesu   0.13nb1 Control interface for Yaesu FT-890 HF transceiver   
skk-jisyo-cdb   201212  Dictionary collection for SKK  
dbskkd-cdb      2.00nb1 SKK dictionary server based on cdb
libchewing      0.2.7   The intelligent phonetic input method library  
skk-jisyo       201212  Dictionary collection for SKK  
autoconf        2.69nb2 Generates automatic source code configuration scripts  
pkg-config      0.28    System for managing library compile/link flags 
python27        2.7.5   Interpreted, interactive, object-oriented programming language

.

$ awk -v OFS='\t' '{ pkg=ver=$1; sub(/-[^-]+$/,"",pkg); sub(/.*-/,"",ver); sub(/[^[:space:]]+[[:space:]]+/,""); print pkg, ver, $0}' file  
yaesu   0.13nb1 Control interface for Yaesu FT-890 HF transceiver   
skk-jisyo-cdb   201212  Dictionary collection for SKK  
dbskkd-cdb      2.00nb1 SKK dictionary server based on cdb
libchewing      0.2.7   The intelligent phonetic input method library  
skk-jisyo       201212  Dictionary collection for SKK  
autoconf        2.69nb2 Generates automatic source code configuration scripts  
pkg-config      0.28    System for managing library compile/link flags 
python27        2.7.5   Interpreted, interactive, object-oriented programming language

Change the separator from tab to whatever it is you want.

您可以在字段1上使用默认字段分隔符和split函数。然后,只需将字段分隔符和split的最后一项附加到第一个字段:

awk '{n=split($1, a, "-"); $1=$1 FS a[n]}1'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM