如何使用sed，awk或grep等Linux程序从HTML选择列表中删除唯一值？

Question

I copy the HTML from a select boxes, and trying to figure out a quick way to remove the HTML so I am left with a list of names. 我从一个选择框中复制HTML，并试图找出一种删除HTML的快速方法，所以我只剩下一个名称列表。 Generally it's not a problem, but these have unique values. 通常这不是问题，但是它们具有唯一的值。 I would prefer using a program like grep, sed, awk or vi. 我更喜欢使用grep，sed，awk或vi之类的程序。 Right now I have to go through manually and edit each line. 现在，我必须手动检查并编辑每一行。 Any help would be great, thank you! 任何帮助将是巨大的，谢谢！

<option value="DL_54292">(DL)finance</option>
<option value="DL_54274">(DL)sales</option>
<option value="510496">Ben Smith</option
<option value="510507">Christopher Jones</option>
<option value="510513">Dawn James</option>
<option value="510533">Joe Wilson</option>
<option value="551825">Mark Jackson</option>
<option value="510562">Ronnie Libby</option>

Edit: Output format suggested by Fede. 编辑：Fede建议的输出格式。

Trying to get a simple text list, with line feed or carriage return. 试图获得一个简单的文本列表，其中包含换行符或回车符。

finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Answer 1

Use grep to get the texts between the tags, 使用grep获取标签之间的文本，

$ grep -oP '(?<=>)[^<>]+' file
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Answer 2

Since you mentioned vi, you can use this line 自从您提到vi以来，您可以使用此行

:%s_^<option value=".*">\(.*\)</option>$_\1_gi


%s -> substitute in all the file
^ -> start of line
.* -> any characters
\(.*\) -> any characters, remember those.
$ -> end of line
\1 -> first remembered match
gi -> ingnore case and take all matches in line
_ -> substitution separator

:s is search and replace, s_foo_bar replaces foo by bar in current line ：s是搜索和替换，s_foo_bar在当前行中用bar替换foo

Answer 3

awk can do this: awk可以做到这一点：

awk -F"<|>" '{print $3}'
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

If I should be true to your output request the data in parentheses should be gone too: 如果我对您的输出请求真实，那么括号中的数据也应该消失：

awk -F"<|>" '{sub(/[^)]*)/,"",$3);print $3}'
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Answer 4

If you don't mind using Notepad++, then you can use this regex: 如果您不介意使用Notepad ++，则可以使用此正则表达式：

.*>(.*)<.*

And replace with \\1 并替换为\\1

在此处输入图片说明

如何使用sed，awk或grep等Linux程序从HTML选择列表中删除唯一值？

问题描述

4 个解决方案

解决方案1
1 已采纳 2014-09-02 23:42:32

解决方案2
1 2014-09-02 23:44:37

解决方案3
1 2014-09-03 05:17:21

解决方案4
0 2014-09-02 23:45:21

如何使用sed，awk或grep等Linux程序从HTML选择列表中删除唯一值？

问题描述

4 个解决方案

解决方案1 1 已采纳 2014-09-02 23:42:32

解决方案2 1 2014-09-02 23:44:37

解决方案3 1 2014-09-03 05:17:21

解决方案4 0 2014-09-02 23:45:21

解决方案1
1 已采纳 2014-09-02 23:42:32

解决方案2
1 2014-09-02 23:44:37

解决方案3
1 2014-09-03 05:17:21

解决方案4
0 2014-09-02 23:45:21