从文件中提取正则表达式捕获组的匹配项

Question

I want to perform the title-named action under linux command-line(several ca bash script will also do). 我想在linux命令行下执行标题命名的操作（几个ca bash脚本也会这样做）。 the command I tried is: 我试过的命令是：

sed 's/href="([^"])"/$1/g' page.html > list.lst

but obviously it failed. 但显然它失败了。

To be precise, here is my input: 确切地说，这是我的意见：

<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />

the output I want would be a comma-separated or space-separated list of all matches in the input file: 我想要的输出是输入文件中所有匹配的逗号分隔或空格分隔列表：

style/css/colors.css,style/css/global.css,style/css/icons.css

I think I got the right expression: href="([^"]*)" 我想我得到了正确的表达：href =“（[^”] *）“

but I have no clue how to perform this. 但我不知道如何执行此操作。 sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them ) sed将进行搜索/替换，这不是我想要的。（相反，我只需要保持匹配并抛弃其余部分，而不是替换它们）

Answer 1

grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'

This will extract all the lines that contain href in them and will only get the first href on each line. 这将提取包含href所有行，并且只会在每行上获得第一个href 。 Also, refer to this post about parsing HTML with regular expressions. 另外，请参阅此文章，了解如何使用正则表达式解析HTML。

从文件中提取正则表达式捕获组的匹配项

问题描述

1 个解决方案

解决方案1
7 已采纳 2011-07-26 14:38:58

从文件中提取正则表达式捕获组的匹配项

问题描述

1 个解决方案

解决方案1 7 已采纳 2011-07-26 14:38:58

解决方案1
7 已采纳 2011-07-26 14:38:58