正则表达式解析 XML - RSS 提要

Question

<atom:link rel="self" href="http://www.independent.co.uk/"/>
<item>
<title>
Coronavirus: Why the Covid-19 economic stimulus deal will make it to Trump&apos;s desk
</title>
<link>
https://www.independent.co.uk/news/world/americas/us-politics/coronavirus-economic-stimulus-deal-covid-19-trump-bill-senate-house-a9419976.html
</link>
<description>
<![CDATA[
News Analysis: When Senate tries to pass major bills, there's always one day of chaos. Monday appears to be that day.
]]>
</description>

For the content above i would like to extract the title, link and description How can I formulate my regex rule to capture this?对于上面的内容，我想提取标题、链接和描述如何制定我的正则表达式规则来捕获它？

The end goal being to dump the extracted content to a predefined sql db that i created最终目标是将提取的内容转储到我创建的预定义 sql db

Answer 1

As suggested in comments most likely you should be using an XML parser and not regex, but as the format of the RSS feed is probably consistent and quite simple a regex solution might work too.正如评论中所建议的，您很可能应该使用 XML 解析器而不是正则表达式，但由于 RSS 提要的格式可能一致且非常简单，正则表达式解决方案也可能有效。

For the current example you can use:对于当前示例，您可以使用：

<(.+)>\s*(?:<!\[CDATA\[)?\s*(.*)\s*(?:]]>)?\s*<\/\1>

Explanation:解释：

<(.+)> - matches opening tag, captures the name <(.+)> - 匹配开始标签，捕获名称
\\s* - matches optional whitespace characters (new line in your example) \\s* - 匹配可选的空白字符（示例中的新行）
(?:<!\\[CDATA\\[)? - non-capturing group for <![CDATA[ , matched 0 or 1 times - <![CDATA[非捕获组，匹配 0 或 1 次
\\s* - matches optional whitespace characters \\s* - 匹配可选的空白字符
(.*) - capturing group that will catch any characters (.*) - 将捕获任何字符的捕获组
\\s* - matches optional whitespace characters \\s* - 匹配可选的空白字符
(?:]]>)? - non-capturing group for ]]> (CDATA closing), matched 0 or 1 times - ]]>非捕获组（CDATA 关闭），匹配 0 次或 1 次
\\s* - matches optional whitespace characters \\s* - 匹配可选的空白字符
<\\/\\1> - matches closing tag with same name as opening tag (backreference to 1st capture group) <\\/\\1> - 匹配与开始标签同名的结束标签（对第一个捕获组的反向引用）

 let input = `<title> Coronavirus: Why the Covid-19 economic stimulus deal will make it to Trump&apos;s desk </title> <link> https://www.independent.co.uk/news/world/americas/us-politics/coronavirus-economic-stimulus-deal-covid-19-trump-bill-senate-house-a9419976.html </link> <description> <![CDATA[ News Analysis: When Senate tries to pass major bills, there's always one day of chaos. Monday appears to be that day. ]]> </description>`; let regex = /<(.+)>\\s*(?:<!\\[CDATA\\[)?\\s*(.*)\\s*(?:]]>)?\\s*<\\/\\1>/g; let result; do { result = regex.exec(input); if (result) { console.log(result[1] + ": " + result[2]); } } while (result);

正则表达式解析 XML - RSS 提要

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-31 06:44:05

正则表达式解析 XML - RSS 提要

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-31 06:44:05

解决方案1
1 已采纳 2020-03-31 06:44:05