[英]Using sed to extract element content of an XML file
Well, using sed
I'm trying to extract everything between <Transport_key>
and </Transport_key>
from input files like this: 好吧,我正在尝试使用sed
从如下输入文件中提取<Transport_key>
和</Transport_key>
之间的所有内容:
<?xml version="1.0" encoding="utf-8"?>
<Envelope xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<Header>
<Security>
<Transport_key>
<EncryptedKey Id="TK" xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p" />
<CipherData>
<CipherValue>pifKajuAK8FKwqLEhKIP4x5V5XUQyrwhpA</CipherValue>
</CipherData>
</EncryptedKey>
</Transport_key>
</Security>
</Header>
<Body>
</Body>
</Envelope>
so i want to get 所以我想得到
<EncryptedKey Id="TK" xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p" />
<CipherData>
<CipherValue>pifKajuAK8FKwqLEhKIP4x5V5XUQyrwhpA</CipherValue>
</CipherData>
</EncryptedKey>
regardless of any optional newlines between elements. 无论元素之间是否有任何可选的换行符。 I just want the text between the two strings unmodified, even if the input is a single big line. 我只希望两个字符串之间的文本保持不变,即使输入是一条大行也是如此。
I tried with 我尝试过
sed -e "s@.*<Transport_key>\(.*\)</Transport_key>.*@\1@" test.txt
but in the meantime I learned, that sed
is taking inputs line per line and it cannot work. 但与此同时,我了解到, sed
每行占用输入行,但无法正常工作。
Is there a solution for that? 有解决方案吗?
For your " last try without such ... ", grep approach: 对于您的“ 最后尝试,没有这样的... ”, grep方法:
grep -Poz '<Transport_key>\s*\K[\s\S]*(?=</Transport_key>)' test.txt
The output: 输出:
<EncryptedKey Id="TK" xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p" />
<CipherData>
<CipherValue>pifKajuAK8FKwqLEhKIP4x5V5XUQyrwhpA</CipherValue>
</CipherData>
</EncryptedKey>
For your further proper tries, xmlstarlet approach: 为了您进一步适当尝试, xmlstarlet方法:
xmlstarlet sel -t -c '//Transport_key/*' -n test.txt
It would be safier to use an xml parser but for some cases it can also be done with regex. 使用xml解析器会更安全,但在某些情况下,也可以使用正则表达式来完成。
perl -0777 -ne 'print for m@<EncryptedKey(?!</EncryptedKey).*</EncryptedKey>@gs' <test.txt
from perl -h
从perl -h
modifiers 修饰符
.
s : .
matches \\n
符合\\n
regex: 正则表达式:
Via sed, you can try the following : 通过sed,您可以尝试以下操作:
sed -n '/<Transport_key>/,/<\/Transport_key>/p' test1.xml | sed -e '/Transport_key/d'
The first command takes everything between the Transport_key tags. 第一条命令将Transport_key标记之间的所有内容都包含在内。 Since this also prints the Transport_key tags, the second command deletes the lines containing the Transport_key tags. 由于这也会打印Transport_key标签,因此第二条命令将删除包含Transport_key标签的行。
The simplest solution to this particular problem that's independent of white space is to use GNU awk for multi-char RS: 与空白无关的此特定问题的最简单解决方案是对多字符RS使用GNU awk:
$ gawk -v RS='\\s*</?Transport_key>\\s*' 'NR==2' file
<EncryptedKey Id="TK" xmlns="http://www.w3.org/2001/04/xmlenc#">
<EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p" />
<CipherData>
<CipherValue>pifKajuAK8FKwqLEhKIP4x5V5XUQyrwhpA</CipherValue>
</CipherData>
</EncryptedKey>
$ tr -d '\n' < file
<?xml version="1.0" encoding="utf-8"?><Envelope xmlns:xenc="http://www.w3.org/2001/04/xmlenc#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><Header><Security><Transport_key><EncryptedKey Id="TK" xmlns="http://www.w3.org/2001/04/xmlenc#"><EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p" /><CipherData><CipherValue>pifKajuAK8FKwqLEhKIP4x5V5XUQyrwhpA</CipherValue></CipherData></EncryptedKey></Transport_key></Security></Header><Body></Body></Envelope>
$ tr -d '\n' < file | gawk -v RS='\\s*</?Transport_key>\\s*' 'NR==2'
<EncryptedKey Id="TK" xmlns="http://www.w3.org/2001/04/xmlenc#"><EncryptionMethod Algorithm="http://www.w3.org/2001/04/xmlenc#rsa-oaep-mgf1p" /><CipherData><CipherValue>pifKajuAK8FKwqLEhKIP4x5V5XUQyrwhpA</CipherValue></CipherData></EncryptedKey>
The reason to use an XML parser, though, is to handle things like the tag value showing up inside a string, etc. properly. 但是,使用XML解析器的原因是要正确处理诸如字符串中显示的标记值之类的事情。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.