简体   繁体   中英

Linux Text File Manipulation with sed/awk

I have a list in the following format

77 Infinite Dust
4 Illusion Dust
12 Dream Shard
29 Star's Sorrow

I need to change this to:

77 <a href="http://www.wowhead.com/?search=Infinite Dust">Infinite Dust</a>
4 <a href="http://www.wowhead.com/?search=Illusion Dust">Illusion Dust</a>
12 <a href="http://www.wowhead.com/?search=Dream Shard">Dream Shard</a>
29 <a href="http://www.wowhead.com/?search=Star's Sorrow">Star's Sorrow</a>

I've managed to get this list to the right format just missing the numbers by using:

sed 's|^[0-9]*.|<a href="http://www.wowhead.com/?search=|g' filename | sed 's|$|">|g' | sed 's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search=\([^"]*\)">#&\1</a>#'

But I can't figure out how to get it to keep the numbers before the list, any help appreciated, thanks!

You can do this with sed by mapping the line parts to groups. in sed groups the A and B in (A)--(B) match to \\1 and \\2, with the added wrinkle that the "()" need to be escaped: eg

sed 's/\([0-9]*\)\ \(.*\)$/\1 -- \2/g' testfile

maps the numbers up to the space to group 1 and everything following to group 2. You can then map group 1 and 2 to whatever you like -, eg by changing the sed replacement to something like

 \1 <a href.....\2">\2</a>

If you had told us what you were ultimately trying to do in your last question , we would have told you a much easier way to do so.

As I said in my answer to your last question, you can have sed remember a part of the pattern, and refer to that part as \\1 , \\2 , etc.

You need to remember the number and the rest of the line separately, so the pattern is: \\([0-9]*\\) \\(.*\\) : which is basically zero of more digits, followed by space, followed by any number of characters.

So your sed command becomes:

`sed -e 's|\([0-9]*\) \(.*\)|\1 <a href="http://www.wowhead.com/?search=\2">\2</a>|'

That command does everything you want in one go.

awk '
{
    s=""
    for(i=2;i<NF;i++) s=s$i
    s=s" "$NF
    printf $1 "<a href=\"http://www.wowhead.com/?search="s
    print "\042>"s"</a>"

} ' file

output

$ ./shell.sh
77<a href="http://www.wowhead.com/?search=Infinite Dust">Infinite Dust</a>
4<a href="http://www.wowhead.com/?search=Illusion Dust">Illusion Dust</a>
12<a href="http://www.wowhead.com/?search=Dream Shard">Dream Shard</a>
29<a href="http://www.wowhead.com/?search=Star's Sorrow">Star's Sorrow</a>

With awk it would be something like:

{  
   rest = substr($0, length($1)+2, length($0));
   printf("%d <a href=\"http://www.wowhead.com/?search=%s\">%s</a>\n", $1, rest, rest); 
}

In sed, you can use the & character to place the matched pattern in the replacement text. For example:

echo xyz | sed 's/^xyz/abc &/'

would output

abc xyz

So in your example,

sed 's|^[0-9]*.|& <a href ....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM