[英]Regex with substitutions using sed|awk and groups
I have this input text 我有这个输入文字
16789248,16789759,"AS24155 Asia Pacific Broadband Wireless Communications Inc"
I want this text 我想要这个文字
"AS24155","Asia Pacific Broadband Wireless Communications Inc"
This regex matches 此正则表达式匹配
/(.*)(AS\d+)(\s)([^"]+).*/g
with this substitution "$2","$4"
替换为
"$2","$4"
I have to process 300k lines and it would be best if I was able to use a linux based command line utility like sed or awk...but I keep getting no matches or matches even though the regex seems to match elsewhere. 我必须处理30万行,如果能够使用基于sed或awk的基于Linux的命令行实用程序,那将是最好的选择,但是即使正则表达式似乎在其他地方也可以匹配,但我一直没有匹配。
Should I be using something different? 我应该使用其他东西吗?
sed -r
can handle it with a few modifications: [0-9]
instead of \\d
and <space>
instead of \\s
. sed -r
可以进行一些修改: [0-9]
代替\\d
和<space>
代替\\s
。 There's no real reason to capture the first and third parts, so I've removed those groups. 没有真正的理由要抓住第一部分和第三部分,因此我删除了这些组。
sed -r -e 's/.*(AS[0-9]+) ([^"]+).*/"\1","\2"/'
Or if you want to match those character classes exactly, use [[:digit:]]
for \\d
and [[:space:]]
for \\s
: 或者,如果你想完全匹配的字符类,使用
[[:digit:]]
为\\d
和[[:space:]]
为\\s
:
sed -r -e 's/.*(AS[[:digit:]]+)[[:space:]]([^"]+).*/"\1","\2"/'
Alternatively, you could use csvtool
which is more suited to the job of parsing CSV files than sed
is. 另外,您可以使用
csvtool
,它比sed
更适合于解析CSV文件。
csvtool col 3 input.txt | while read number name; do
printf '"%s","%s"\n' "$number" "$name"
done
sed 's/[^"]*"/"/;s[[:space:]]/","/'
根据您的样本并避免分组的问题
sed is the best choice for this but FYI here's how you could use almost your exact RE in GNU awk to do the job: sed是最佳选择,但仅供参考,这是您可以在GNU awk中几乎使用您的确切RE来完成的工作:
$ awk 'match($0,/.*(AS[0-9]+)\s([^"]+).*/,a){printf "\"%s\",\"%s\"\n", a[1], a[2]}' file
"AS24155","Asia Pacific Broadband Wireless Communications Inc"
Your original command was probably failing because only some tools accept \\s
instead of [[:space:]]
and almost none accept \\d
instead of [[:digit:]]
(or [0-9]
). 您的原始命令可能失败,因为只有某些工具接受
\\s
而不是[[:space:]]
,几乎没有工具接受\\d
而不是[[:digit:]]
(或[0-9]
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.