[英]How to extract multiple patterns between tokens at once with sed?
Let's assume that i have file named inputFile which looks like that: 假设我有一个名为inputFile的文件,如下所示:
blahblah token substring token something else token substring2 token
Whole file contain only 1 long line. 整个文件仅包含1个长行。
I want to extract substrings between tokens with sed (substring,substring2). 我想用sed(substring,substring2)在令牌之间提取子字符串。
At this moment I have: 目前,我有:
[sed "s/^.* \?token\(.* \)token.* \?/\1/"][1] inputFile > outputFile
I try to do this based on these questions, but unfortunately it returns only last substring 我尝试根据这些问题来执行此操作,但不幸的是,它仅返回最后一个子字符串
Extract lines between 2 tokens in a text file using bash 使用bash提取文本文件中2个标记之间的行
How to replace multiple patterns at once with sed? 如何用sed一次替换多个模式?
How to select lines between two patterns? 如何在两种模式之间选择线?
Answers with explanation will be great. 有解释的答案将是非常好的。
UPDATE real input code: UPDATE实际输入代码:
<archive><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>52333</text><sendTime>554</sendTime><deliveryTime>765</deliveryTime></message><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>4332</text><sendTime>764</sendTime><deliveryTime>922</deliveryTime></message></archive>
Expected output: 预期产量:
apr gtr 52333
apr gtr 4332
The problem is that sed
is greedy so the above command will only return substring2
if you add the global flag ( g
) : 问题是sed
是贪婪的,因此,如果添加全局标记( g
),则上述命令将仅返回substring2
:
You could use awk
for this where you redefine the fieldseparator FS
to be the string token
. 您可以为此使用awk
,在其中将fieldseparator FS
重新定义为字符串token
。 This way your strings are on the even field positions : 这样,您的字符串就位于偶数字段位置:
$ echo "blahblah token substring token something else token substring2 token" | \
awk -F 'token' '{for(i=2;i<=NF;i+=2) {print $i}}'
substring
substring2
update: 更新:
If your input is an xml-file you might want to do : 如果您输入的是xml文件,则可能需要执行以下操作:
<archive>
<message id="0">
<receiver>apr</receiver>
<sender>gtr</sender>
<text>52333</text>
<sendTime>554</sendTime>
<deliveryTime>765</deliveryTime>
</message>
<message id="0">
<receiver>apr</receiver>
<sender>gtr</sender>
<text>4332</text>
<sendTime>764</sendTime>
<deliveryTime>922</deliveryTime>
</message>
</archive>"
leading to the cmd : 通往cmd:
xmlstarlet sel -t -m '//message' -v receiver -o " " -v sender -o " " -v text -n <file>
which outputs 哪个输出
apr gtr 52333
apr gtr 4332
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.