I want to print
userId = 1234
userid = 12345
timestamp = 88888888
js = abc
from my data
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
<input name="userId" value="1234" type="hidden"> messsssssssssssssssssss
<input name="userid" value="12345" type="hidden"> messssssssssssssssssss
<input name="timestamp" value="88888888" type="hidden"> messssssssssssss
<input name="js" value="abc" type="hidden"> messssssssssssssssssssssssss
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
How can I do this with AWK(or whatever)? Assume that my data is stored in the " $info
" variable (single line data).
Edit : single line data i mean all data represent like this
messss...<input name="userId" value="1234" type="hidden">messsss...<input ....>messssssss
So i can't use grep to extract interest section.
I'm not sure I understand your "single line data" comment but if this is in a file, you can just do something like:
cat file
| grep '^<input '
| sed 's/^<input name="//'
| sed 's/" value="/ = /'
| sed 's/".*$//'
Here's the cut'n'paste version:
cat file | grep '^<input ' | sed 's/^<input name="//' | sed 's/" value="/ = /' | sed 's/".*$//'
This turns:
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
<input name="userId" value="1234" type="hidden"> messsssssssssssssssssss
<input name="userid" value="12345" type="hidden"> messssssssssssssssssss
<input name="timestamp" value="88888888" type="hidden"> messssssssssssss
<input name="js" value="abc" type="hidden"> messssssssssssssssssssssssss
messssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
quite happily into:
userId = 1234
userid = 12345
timestamp = 88888888
js = abc
The grep
simply extracts the lines you want while the sed
commandsrespectively:
This part should probably be a comment on Pax's answer, but it got a bit long for that little box. I'm thinking 'single line data' means you don't have any newlines in your variable at all? Then this will work:
echo "$info" | sed -n -r '/<input/s/<input +name="([^"]+)" +value="([^"]+)"[^>]*>[^<]*/\1 = \2\n/gp'
Notes on interesting bits: - -n
means don't print by default - we'll say when to print with that p
at the end.
-r
means extended regex
/<input/
at the beginning makes sure we don't even bother to work on lines that don't contain the desired pattern
That \\n
at the end is there to ensure all records end up on separate lines - any original newlines will still be there, and the fastest way to get rid of them is to tack on a '| grep .' on the end - you could use some sed magic but you wouldn't be able to understand it thirty seconds after you typed it in.
I can think of ways to do this in awk, but this is really a job for sed (or perl!).
要处理包含多行的变量,您需要将变量名称放在双引号中:
echo "$info"|sed 's/^\(<input\( \)name\(=\)"\([^"]*\)" value="\([^"]*\)"\)\?.*/\4\2\3\2\5/'
使用Perl
cat file | perl -ne 'print($1 . "=" . $2 . "\n") if(/name="(.*?)".*value="(.*?)"/);'
IMO, parsing HTML should be done with a proper HTML/XML parser. For example, Ruby has an excellent package, Nokogiri, for parsing HTML/XML:
ruby -e '
require "rubygems"
require "nokogiri"
doc = Nokogiri::HTML.parse(ARGF.read)
doc.search("//input").each do |node|
atts = node.attributes
puts "%s = %s" % [atts["name"], atts["value"]]
end
' mess.html
produces the output you're after
AWK:
BEGIN {
# Use record separator "<", instead of "\n".
RS = "<"
first = 1
}
# Skip the first record, as that begins before the first tag
first {
first = 0
next
}
/^input[^>]*>/ { #/
# make sure we don't match outside of the tag
end = match($0,/>/)
# locate the name attribute
pos = match($0,/name="[^"]*"/)
if (pos == 0 || pos > end) { next }
name = substr($0,RSTART+6,RLENGTH-7)
# locate the value attribute
pos = match($0,/value="[^"]*"/)
if (pos == 0 || pos > end) { next }
value = substr($0,RSTART+7,RLENGTH-8)
# print out the result
print name " = " value
}
awk和sed之类的工具可以与XMLStarlet和HTML Tidy一起使用来解析HTML。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.