简体   繁体   中英

Replace character in regex match only

I tried to search regex in textfile and than in match scope only replace one character by other. My problem is, that I'm unable to do it by some simple way.

Example source file:

...
 <br>
<a id="some shopitem" ref="#some shop item name 01 a" style="text-decoration:none;"><h3 style="background-color: #ccc;">blah blab hasdk sldk sasdas dasda sd</h3></a>
<table>
 <td width="500">
....

there I need to match regexp ref=\\"#[[:alnum:] ]*\\" (ref="#whatever name with spaces") and there replace spaces in match with "-", but of course do not change another spaces out or regex match.

So result should looks like this:

....
 <br>
<a id="some shopitem" href="#some-shop-item-name-01-a" style="text-decoration:none;"><h3 style="background-color: #ccc;">blah blab hasdk sldk sasdas dasda sd</h3></a>
<table>
 <td width="500">
....

Would it be even possible to do it without some sort of script just in one-line command in bash? Is there some way how to replace spaces in group? something like sed -rs/ref=\\"#([[:alnum:] ]*\\)/(\\1s/ /-/g)/g' ?

A perl solution:

perl -pe 's/(ref="#)([\w\s]+)(")/ ($x,$y,$z)=($1,$2,$3); $y =~ s{\s}{-}g; $x.$y.$z /eg'

It's slightly more permissive about what can appear in the ref name (underscore, tab, some other whitespace chars)

Would it be even possible to do it without some sort of script just in one-line command in bash?

Your question somehow triggered a burning ambition in me to do this...!

varfile=SOURCEFILE && varsubstfile=RESULTFILE && IFS=' ' read -a repl <<< $(sed -r 's/(.*)(ref="#.*?")( .*)/\2/;tx;d;:x' $varfile | sed -e 's/\ /\-/g' | sed ':a;N;$!ba;s/\s/ /g') && for i in "${!repl[@]}"; do needle["$i"]=$(sed 's/\-/\ /g' <<< "${repl["$i"]}"); done && cp $varfile $varsubstfile && for i in "${!needle[@]}"; do sed -ir "s/${needle[i]}/${repl[i]}/g" $varsubstfile; done && unset needle && unset repl && less $varsubstfile && unset varfile && unset varsubstfile

SOURCEFILE is your sourcefile, RESULTFILE is the name of a file where the output gets written to, so change both of them according to your needs.

Well... it is kind of a script, but it's a (damn huge) one-liner :)

I supposed that there are more occurences of ref="#.*" in the whole file, otherwise it would have been much shorter (although I don't remember the shorter version anymore).

... and I really hope this works on your *nix-system :D


Just in case you want to know what this thing does, here's an explanation:

 varfile=SOURCEFILE && #set variable for the sourcefile varsubstfile=RESULTFILE && #set variable for the resultfile IFS=' ' read -a repl <<< #we're going to read multiple values into an array "repl" #delimited by a space $( #grab only the second capture group (ref="#.*?") sed -r 's/(.*)(ref="#.*?")( .*)/\\2/;tx;d;:x' $varfile | sed -e 's/\\ /\\-/g' | #replace every space in (ref="#.*?") with a dash sed ':a;N;$!ba;s/\\s/ /g' #replace newlines with a space #when there is more than one occurence sed will delimit them with a newline #but i set a space as the delimiter for the read operation, #thus the last replacement ) && #we now have every needed replacement-string in an array called "repl" for i in "${!repl[@]}"; do #iterate over every value in the array we just read needle["$i"]=$(sed 's/\\-/\\ /g' <<< "${repl["$i"]}"); #replace dashes with spaces and store in a new variable done && #and now every original string, the needle we are going to search for #is stored in another array cp $varfile $varsubstfile && #copy sourcefile to resultfile for i in "${!needle[@]}"; do #for every string we are going to replace sed -ir "s/${needle[i]}/${repl[i]}/g" $varsubstfile; #... we replace it! done #technically we're done here #but i like to clean up afterwards and show the result with less unset repl && less $varsubstfile && unset varfile && unset varsubstfile 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM