简体   繁体   English

如何使用sed一次提取令牌之间的多个模式?

[英]How to extract multiple patterns between tokens at once with sed?

Let's assume that i have file named inputFile which looks like that: 假设我有一个名为inputFile的文件,如下所示:

blahblah token substring token something else token substring2 token

Whole file contain only 1 long line. 整个文件仅包含1个长行。

I want to extract substrings between tokens with sed (substring,substring2). 我想用sed(substring,substring2)在令牌之间提取子字符串。

At this moment I have: 目前,我有:

[sed "s/^.* \?token\(.* \)token.* \?/\1/"][1] inputFile > outputFile

I try to do this based on these questions, but unfortunately it returns only last substring 我尝试根据这些问题来执行此操作,但不幸的是,它仅返回最后一个子字符串

Extract lines between 2 tokens in a text file using bash 使用bash提取文本文件中2个标记之间的行

How to replace multiple patterns at once with sed? 如何用sed一次替换多个模式?

How to select lines between two patterns? 如何在两种模式之间选择线?

Answers with explanation will be great. 有解释的答案将是非常好的。

UPDATE real input code: UPDATE实际输入代码:

<archive><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>52333</text><sendTime>554</sendTime><deliveryTime>765</deliveryTime></message><message id="0"><receiver>apr</receiver><sender>gtr</sender><text>4332</text><sendTime>764</sendTime><deliveryTime>922</deliveryTime></message></archive>

Expected output: 预期产量:

apr gtr 52333
apr gtr 4332

The problem is that sed is greedy so the above command will only return substring2 if you add the global flag ( g ) : 问题是sed是贪婪的,因此,如果添加全局标记( g ),则上述命令将仅返回substring2

You could use awk for this where you redefine the fieldseparator FS to be the string token . 您可以为此使用awk ,在其中将fieldseparator FS重新定义为字符串token This way your strings are on the even field positions : 这样,您的字符串就位于偶数字段位置:

$ echo "blahblah token substring token something else token substring2 token"  | \
  awk -F 'token' '{for(i=2;i<=NF;i+=2) {print $i}}'
 substring 
 substring2

update: 更新:

If your input is an xml-file you might want to do : 如果您输入的是xml文件,则可能需要执行以下操作:

<archive>
   <message id="0">
       <receiver>apr</receiver>
       <sender>gtr</sender>
       <text>52333</text>
       <sendTime>554</sendTime>
       <deliveryTime>765</deliveryTime>
   </message>
   <message id="0">
       <receiver>apr</receiver>
       <sender>gtr</sender>
       <text>4332</text>
       <sendTime>764</sendTime>
       <deliveryTime>922</deliveryTime>
   </message>
 </archive>" 

leading to the cmd : 通往cmd:

xmlstarlet sel -t -m '//message' -v receiver -o " " -v sender -o " " -v text -n <file>

which outputs 哪个输出

apr gtr 52333
apr gtr 4332

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM