简体   繁体   English

如何使用sed,awk或grep等Linux程序从HTML选择列表中删除唯一值?

[英]How to remove unique values from an HTML select list with linux program like sed, awk, or grep?

I copy the HTML from a select boxes, and trying to figure out a quick way to remove the HTML so I am left with a list of names. 我从一个选择框中复制HTML,并试图找出一种删除HTML的快速方法,所以我只剩下一个名称列表。 Generally it's not a problem, but these have unique values. 通常这不是问题,但是它们具有唯一的值。 I would prefer using a program like grep, sed, awk or vi. 我更喜欢使用grep,sed,awk或vi之类的程序。 Right now I have to go through manually and edit each line. 现在,我必须手动检查并编辑每一行。 Any help would be great, thank you! 任何帮助将是巨大的,谢谢!

<option value="DL_54292">(DL)finance</option>
<option value="DL_54274">(DL)sales</option>
<option value="510496">Ben Smith</option
<option value="510507">Christopher Jones</option>
<option value="510513">Dawn James</option>
<option value="510533">Joe Wilson</option>
<option value="551825">Mark Jackson</option>
<option value="510562">Ronnie Libby</option>

Edit: Output format suggested by Fede. 编辑:Fede建议的输出格式。

Trying to get a simple text list, with line feed or carriage return. 试图获得一个简单的文本列表,其中包含换行符或回车符。

finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Use grep to get the texts between the tags, 使用grep获取标签之间的文本,

$ grep -oP '(?<=>)[^<>]+' file
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Since you mentioned vi, you can use this line 自从您提到vi以来,您可以使用此行

:%s_^<option value=".*">\(.*\)</option>$_\1_gi


%s -> substitute in all the file
^ -> start of line
.* -> any characters
\(.*\) -> any characters, remember those.
$ -> end of line
\1 -> first remembered match
gi -> ingnore case and take all matches in line
_ -> substitution separator

:s is search and replace, s_foo_bar replaces foo by bar in current line :s是搜索和替换,s_foo_bar在当前行中用bar替换foo

awk can do this: awk可以做到这一点:

awk -F"<|>" '{print $3}'
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

If I should be true to your output request the data in parentheses should be gone too: 如果我对您的输出请求真实,那么括号中的数据也应该消失:

awk -F"<|>" '{sub(/[^)]*)/,"",$3);print $3}'
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

If you don't mind using Notepad++, then you can use this regex: 如果您不介意使用Notepad ++,则可以使用此正则表达式:

.*>(.*)<.*

And replace with \\1 并替换为\\1

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 unix/linux grep/awk/sed 从文件中使用空格 output 段落 - how to output paragraphs with spaces from a file using unix/linux grep/awk/sed 如何使用sed,awk或grep从HTML表格单元格中提取数据? - How can I extract data from HTML table cells using sed, awk, or grep? 如何从PHP / HTML来源中删除评论? (使用sed / awk / grep等。) - How to strip comments from Php/Html source? (with sed/awk/grep etc..) 在CSV中选择与GNU Linux中的模式文件中的任何模式都不匹配的行(AWK / SED / GREP) - Select rows in a CSV not matching any pattern in pattern file in GNU Linux (AWK/SED/GREP) 使用 awk、sed 等命令从文件中删除字段 - Remove field from file using commands like awk, sed 使用curl和grep / sed / awk在HTML标签中获取时间 - Get time in HTML tags using curl and grep/sed/awk 使用grep / sed / awk在2个关键字之间列出行 - List lines beetween 2 keywords using grep/sed/awk 如何使用sed / awk / grep ...从文件中提取块,每个块将其保存到文件中 - How to use sed / awk / grep … to extract blocks from a file and each block save it into a file 使用awk sed或grep来解析来自网页源的URL - Using awk sed or grep to parse URLs from webpage source 通过 bashscript (sed/awk/grep?) 从字符串中提取精确模式 - Extact exact pattern from string through bashscript (sed/awk/grep?)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM