简体   繁体   中英

sed - Manipulate Phone Number in xml

I'm trying to manipulate a xml file with a bash script and sed and can't get it. The structure in the xml looks like

<Name>title firstname lastname</Name><Home>+49 (30) 1234 94</Home><Mobile>+49 (171) 1234 94</Mobile>
<Name>title firstname lastname</Name><Home>+49 (30) 1234 94</Home><Mobile>+49 (171) 1234 94</Mobile>

I need to eliminate the space and ( and ) ONLY in the phone number. After a day whis regex and sed, I could not get it. I have the string that match as needed but I can't get the groups and the right substitution.

sed -e 's/([0-9]\s|[0-9]\s\([0-9]|[0-9]\)\s[0-9]|[0-9]\s[0-9])/gm'

Don't use sed to manipulate XML documents.
There are good tools for that activity.
xmlstarlet is one of them.

A valid XML structure requires a root element at the top of the node tree.
Let's say we have an XML fragment ( test.xml ):

<root>
    <Name>title firstname lastname</Name>
    <Home>+49 (30) 1234 94</Home>
    <Mobile>+49 (171) 1234 94</Mobile>
    <Name>title firstname lastname</Name>
    <Home>+49 (30) 1234 94</Home>
    <Mobile>+49 (171) 1234 94</Mobile>
</root>

The command :

xmlstarlet ed -u "//Home|//Mobile" -x "translate(normalize-space(.),'() ','')" test.xml

Details :

ed - enables edit mode

-u - to update xml structure

"//Home|//Mobile" - xpath expression to select the needed elements

-x - to update the needed values with xpath expression

. (period) - points to the current selected node(s)

normalize-space() - the function which returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space

translate(string, string, string) - the function which returns the first argument string with occurrences of characters in the second argument string replaced by the character at the corresponding position in the third argument string.


The output:

<?xml version="1.0"?>
<root>
  <Name>title firstname lastname</Name>
  <Home>+4930123494</Home>
  <Mobile>+49171123494</Mobile>
  <Name>title firstname lastname</Name>
  <Home>+4930123494</Home>
  <Mobile>+49171123494</Mobile>
</root>

Assuming the format of the numbers remains same:

sed -r 's/(\+[0-9]{2}) \(([0-9]{2,3})\) ([0-9]{4}) ([0-9]{2})/\1\2\3\4/g' input
<Name>title firstname lastname</Name><Home>+4930123494</Home><Mobile>+49171123494</Mobile>
<Name>title firstname lastname</Name><Home>+4930123494</Home><Mobile>+49171123494</Mobile>
<Name>title firstname lastname</Name><Home>0049 (30) 1234567 94</Home><Mobile>+491711234577 0</Mobile>
<Name>title firstname lastname</Name><Home>+1 39976 1234 94</Home><Mobile>+49 (171) 1234 94</Mobile>    

sed -r 's/(\+|\d*?) ([0-9]{1,})|\s\(([0-9]{2,})\) ([0-9]{2,}) ([0-9]{1,4})/\1\2\3\4\5/g' input (see above)

<Name>title firstname lastname</Name><Home>004930123456794</Home><Mobile>+4917112345770</Mobile>
<Name>title firstname lastname</Name><Home>+139976123494</Home><Mobile>+49171123494</Mobile>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM