[英]Parse string using grep, sed or awk
我有一个看起来像这样的字符串
807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482
我需要这样的输出:
S:S6S11,07001,23668732,1,1496851208,807262,7482
我需要这样的字符串与列分开:
S:S6 + the next 3 characters;
在这种情况下, S:S6S11
可以工作:
echo 807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482 |
grep -P -o 'F:S6.{1,3}'
输出:
S:S6S11
这使我接近,只得到数字
echo 807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482 |
grep -o '[0-9]\+' | tr '\n' ','
输出:
807001,6,11,23668732,1,1496851208,807262,7482,
如何在输出的开头获取S:S6S11
在此之后避免6,11
?
如果使用sed或awk可以更好地做到这一点,我不介意。
字符串的其余部分是:
我只需要数字,但它们必须与字母相对应。
awk
解救!
$ echo "807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482" |
awk '{pre=gensub(".*(S:S6...).*","\\1","g"); ## extract prefix
sub(/./,","); ## replace first char with comma
gsub(/[^0-9]+/,","); ## replace non-numeric values with comma
print pre $0}' ## print prefix and replaced line
S:S6S11,07001,6,11,23668732,1,1496851208,807262,7482
...或sed
:
$ echo "807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482" | sed -re 's/^.([0-9]+)(S:S6...)ABB([0-9]+)CC([0-9]+)DD([0-9]+)\.([0-9]+)EE([0-9]*)$/\2,\1,\3,\4,\5,\6,\7/'
S:S6S11,07001,23668732,1,1496851208,807262,7482
也就是说,如果您的行格式是固定的。
如果使用GNU awk,则可以通过将RS
定义为所需的模式来简化任务,例如:
parse.awk
BEGIN { RS = "S:S6...|\n" }
# Start of the string
RT != "\n" {
sub(".", ",") # Replace first char by a comma
pst = $0 # Remember the rest of the string
pre = RT # Remember the S:S6 pattern
}
# End of string
RT == "\n" {
gsub("[A-Z.]+", ",") # Replace letters and dots by commas
print pre pst $0 # Print the final result
}
像这样运行:
s=807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482
gawk -f parse.awk <<<$s
输出:
S:S6S11,07001,23668732,1,1496851208,807262,7482
这是使用sed
一种方法:
解析
h # Duplicate string to hold space
s/.*(S:S6...).*/\1/ # Extract the desired pattern
x # Swap hold and pattern space
s/S:S6...// # Remove pattern (still in hold space)
s/[A-Z.]+/,/g # Replace letters and dots with commas
s/./,/ # Replace first char with comma
G # Append hold space content
s/([^\n]+)\n(.*)/\2\1/ # Rearrange to match desired output
像这样运行它:
s=807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482
sed -Ef parse.sed <<<$s
输出:
S:S6S11,07001,23668732,1,1496851208,807262,7482
听起来这可能是您真正想做的:
$ awk -F'[A-Z]{2,}|[.]' -v OFS=',' '{$1=substr($1,7) OFS substr($1,2,5)}1' file
S:S6S11,07001,23668732,1,1496851208,807262,7482
但是您对如何匹配以及在何处匹配的要求非常不清楚,仅一个示例输入行就无济于事。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.