[英]How to remove unique values from an HTML select list with linux program like sed, awk, or grep?
I copy the HTML from a select boxes, and trying to figure out a quick way to remove the HTML so I am left with a list of names. 我从一个选择框中复制HTML,并试图找出一种删除HTML的快速方法,所以我只剩下一个名称列表。 Generally it's not a problem, but these have unique values. 通常这不是问题,但是它们具有唯一的值。 I would prefer using a program like grep, sed, awk or vi. 我更喜欢使用grep,sed,awk或vi之类的程序。 Right now I have to go through manually and edit each line. 现在,我必须手动检查并编辑每一行。 Any help would be great, thank you! 任何帮助将是巨大的,谢谢!
<option value="DL_54292">(DL)finance</option>
<option value="DL_54274">(DL)sales</option>
<option value="510496">Ben Smith</option
<option value="510507">Christopher Jones</option>
<option value="510513">Dawn James</option>
<option value="510533">Joe Wilson</option>
<option value="551825">Mark Jackson</option>
<option value="510562">Ronnie Libby</option>
Edit: Output format suggested by Fede. 编辑:Fede建议的输出格式。
Trying to get a simple text list, with line feed or carriage return. 试图获得一个简单的文本列表,其中包含换行符或回车符。
finance sales Ben Smith Christopher Jones Dawn James Joe Wilson Mark Jackson Ronnie Libby
Use grep to get the texts between the tags, 使用grep获取标签之间的文本,
$ grep -oP '(?<=>)[^<>]+' file
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
Since you mentioned vi, you can use this line 自从您提到vi以来,您可以使用此行
:%s_^<option value=".*">\(.*\)</option>$_\1_gi
%s -> substitute in all the file
^ -> start of line
.* -> any characters
\(.*\) -> any characters, remember those.
$ -> end of line
\1 -> first remembered match
gi -> ingnore case and take all matches in line
_ -> substitution separator
:s is search and replace, s_foo_bar replaces foo by bar in current line :s是搜索和替换,s_foo_bar在当前行中用bar替换foo
awk
can do this: awk
可以做到这一点:
awk -F"<|>" '{print $3}'
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
If I should be true to your output request the data in parentheses should be gone too: 如果我对您的输出请求真实,那么括号中的数据也应该消失:
awk -F"<|>" '{sub(/[^)]*)/,"",$3);print $3}'
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
If you don't mind using Notepad++, then you can use this regex: 如果您不介意使用Notepad ++,则可以使用此正则表达式:
.*>(.*)<.*
And replace with \\1
并替换为\\1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.