简体   繁体   English

使用sed linux命令

[英]Working with sed linux command

In my shellscript code I saw that there is line that is handling Telephone number using sed command. 在我的shellscript代码中,我看到有一行正在使用sed命令处理Telephone号码。

sed "s~<Telephone type[ ]*=[ ]*\"fax\"[ ]*><Number>none[ ]*</Number></Telephone>~~g" input.xml > output.xml

I am not understanding what the regular expression actually does. 我不了解正则表达式的实际作用。

<Telephone type[ ]*=[ ]*\"fax\"[ ]*><Number>none[ ]*</Number></Telephone>

I am doing revere engineering to get this working. 我正在做工程,以使其正常工作。

My xml structure like below. 我的xml结构如下。

<ContactMethod>
    <InternetEmailAddress>donald.francis@lexisnexis.com</InternetEmailAddress>
    <Telephone type = "work">
        <Number>215-639-9000 x3281</Number>
    </Telephone>
    <Telephone type = "home">
        <Number>484-231-1141</Number>
    </Telephone>
    <Telephone type = "fax">
        <Number>N/A</Number>
    </Telephone>
    <Telephone type = "work">
        <Number>215-639-9000 x3281</Number>
    </Telephone>
    <Telephone type = "home">
        <Number>484-231-1141</Number>
    </Telephone>
    <Telephone type = "fax">
        <Number>none</Number>
    </Telephone>
    <Telephone type1 = "fax12234">
        <Number>484-231-1141sadsadasdasdaasd</Number>
    </Telephone>
</ContactMethod>

That regex recognises <Telephone type = "fax"> entries where the number is given as none , and deletes them. 该正则表达式可识别<Telephone type = "fax">条目,其中编号为none条目将被删除。

Breakdown: 分解:

s sed command for "substitution". s为“取代” sed命令。

~ pattern separator. ~模式分隔符。 You can choose any character for this. 您可以为此选择任何字符。 sed recoginizes it because it comes right after the s . sed重新识别它是因为它紧随s

<Telephone type This matches the literal text "<Telephone type". <Telephone type匹配文字文本“ <电话类型”。

[ ]* matches zero or more spaces. [ ]*匹配零个或多个空格。

= matches a literal "=" =匹配文字“ =“

[ ]* matches zero or more spaces. [ ]*匹配零个或多个空格。

\\"fax\\" matches literal text. \\"fax\\"匹配文字。 The quotes are escaped because the whole pattern appears inside quotes, but the shell removes the quote characters ( \\ ) before sed sees them. 因为整个模式都出现在引号内,所以引号被转义,但是在sed看到它们之前,shell删除了引号字符( \\ )。

[ ]* matches zero or more spaces. [ ]*匹配零个或多个空格。

><Number>none matches literal text. ><Number>none匹配文字的文本。

[ ]* matches zero or more spaces. [ ]*匹配零个或多个空格。

</Number></Telephone> matches the literal text. </Number></Telephone>与文字文本匹配。

~~ the pattern separators end the search pattern, and surround an empty replace pattern. ~~模式分隔符结束搜索模式,并包围一个空的替换模式。

g is a flag that means the substitution will be performed multiple times on each line. g是一个标志,表示替换将在每行上执行多次。

The only thing that confuses me is that this pattern won't match anything that has line breaks in it, so I presume your input.xml isn't actually formatted like you have in your example data? 唯一令我困惑的是,该模式将不匹配任何包含换行符的内容,因此我认为您的input.xml格式实际上不像示例数据中的格式吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM