简体   繁体   English

sed-在xml中处理电话号码

[英]sed - Manipulate Phone Number in xml

I'm trying to manipulate a xml file with a bash script and sed and can't get it. 我正在尝试使用bash脚本和sed操作xml文件,但无法获取它。 The structure in the xml looks like xml中的结构看起来像

<Name>title firstname lastname</Name><Home>+49 (30) 1234 94</Home><Mobile>+49 (171) 1234 94</Mobile>
<Name>title firstname lastname</Name><Home>+49 (30) 1234 94</Home><Mobile>+49 (171) 1234 94</Mobile>

I need to eliminate the space and ( and ) ONLY in the phone number. 我需要消除的space() 在电话号码。 After a day whis regex and sed, I could not get it. 经过一天的正则表达式和sed,我无法理解。 I have the string that match as needed but I can't get the groups and the right substitution. 我有根据需要匹配的字符串,但是我无法获得组和正确的替换。

sed -e 's/([0-9]\s|[0-9]\s\([0-9]|[0-9]\)\s[0-9]|[0-9]\s[0-9])/gm'

Don't use sed to manipulate XML documents. 不要使用sed处理XML文档。
There are good tools for that activity. 有用于该活动的良好工具。
xmlstarlet is one of them. xmlstarlet是其中之一。

A valid XML structure requires a root element at the top of the node tree. 有效的XML结构在节点树的顶部需要一个根元素。
Let's say we have an XML fragment ( test.xml ): 假设我们有一个XML片段( test.xml ):

<root>
    <Name>title firstname lastname</Name>
    <Home>+49 (30) 1234 94</Home>
    <Mobile>+49 (171) 1234 94</Mobile>
    <Name>title firstname lastname</Name>
    <Home>+49 (30) 1234 94</Home>
    <Mobile>+49 (171) 1234 94</Mobile>
</root>

The command : 命令

xmlstarlet ed -u "//Home|//Mobile" -x "translate(normalize-space(.),'() ','')" test.xml

Details : 详细资料

ed - enables edit mode ed启用编辑模式

-u - to update xml structure -u更新xml结构

"//Home|//Mobile" - xpath expression to select the needed elements "//Home|//Mobile" -xpath表达式以选择所需的元素

-x - to update the needed values with xpath expression -x使用xpath表达式更新所需的值

. (period) - points to the current selected node(s) (句点)-指向当前选定的节点

normalize-space() - the function which returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space normalize-space() -返回带有空格的参数字符串的函数,该空格通过剥离前导和尾随空格并将空格字符序列替换为单个空格进行规范化

translate(string, string, string) - the function which returns the first argument string with occurrences of characters in the second argument string replaced by the character at the corresponding position in the third argument string. translate(string, string, string) -该函数返回第一个参数字符串,其中第二个参数字符串中出现的字符替换为第三个参数字符串中相应位置的字符。


The output: 输出:

<?xml version="1.0"?>
<root>
  <Name>title firstname lastname</Name>
  <Home>+4930123494</Home>
  <Mobile>+49171123494</Mobile>
  <Name>title firstname lastname</Name>
  <Home>+4930123494</Home>
  <Mobile>+49171123494</Mobile>
</root>

Assuming the format of the numbers remains same: 假设数字的格式保持不变:

sed -r 's/(\+[0-9]{2}) \(([0-9]{2,3})\) ([0-9]{4}) ([0-9]{2})/\1\2\3\4/g' input
<Name>title firstname lastname</Name><Home>+4930123494</Home><Mobile>+49171123494</Mobile>
<Name>title firstname lastname</Name><Home>+4930123494</Home><Mobile>+49171123494</Mobile>
<Name>title firstname lastname</Name><Home>0049 (30) 1234567 94</Home><Mobile>+491711234577 0</Mobile>
<Name>title firstname lastname</Name><Home>+1 39976 1234 94</Home><Mobile>+49 (171) 1234 94</Mobile>    

sed -r 's/(\+|\d*?) ([0-9]{1,})|\s\(([0-9]{2,})\) ([0-9]{2,}) ([0-9]{1,4})/\1\2\3\4\5/g' input (see above)

<Name>title firstname lastname</Name><Home>004930123456794</Home><Mobile>+4917112345770</Mobile>
<Name>title firstname lastname</Name><Home>+139976123494</Home><Mobile>+49171123494</Mobile>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM