从 RSS 源 XML 中提取文本标签（使用 Javascript/React）

Question

I've just parsed an RSS Feed (Upwork's), and I have job item data points like title, link etc parsed out as data points (items.title, items.link), however a majority of the data I need to extract about the job (its category, skills etc) is dumped in the "content" data item as one giant block of text.我刚刚解析了一个 RSS 提要（Upwork's），并且我将标题、链接等工作项目数据点解析为数据点（items.title、items.link），但是我需要提取的大部分数据工作（其类别、技能等）作为一大块文本转储在“内容”数据项中。 Generally speaking, the title of the information I need is couched in一般来说，我需要的信息的标题是tags and the information itself is just a blob of text followed by a tag.标签和信息本身只是一个文本块，后跟一个标签。

Here is an example from the XML (items.content):以下是 XML (items.content) 的示例：

We are looking for a developer with capabilities as a Wordpress Frontend/Backend Developer&nbsp;or&nbsp;Full Stack Wordpress Developer. <br /><br /> It is important for us to have experience with hosting, SSL, and&nbsp;Pagebuilders&nbsp;(Elementor/Visual Composer).<br /><br /><b>Hourly Range</b>: $20.00-$45.00 <br /><b>Posted On</b>: December 16, 2020 23:12 UTC<br /><b>Category</b>: Full Stack Development<br /><b>Skills</b>:Website Development, API, Website Redesign, WordPress Plugin, Website Optimization, Google Analytics, Java, JavaScript, PHP, Ruby, Scala, Kotlin, Python, SQL, Very Small (1-9 employees), CSS, Website Security, HTML, Graphic Design, Web Design, jQuery, Adobe Photoshop, Adobe Illustrator <br /><b>Location Requirement</b>: Only freelancers located in the United States may apply. <br /><b>Country</b>: United States <br /><a href="https://www.upwork.com/jobs/Ongoing-Website-development-specialist_%7E018e7e903a64f4e78e?source=rss">click to apply</a>

How do I pull out, for example, the label "Hourly Range" and then the data associated with it: ($20.00 - $45.00)?例如，如何提取 label“每小时范围”以及与之相关的数据：（$20.00 - $45.00）？ To add complexity to this, I would ideally need to be able to separate out each item listed (eg HTML, CSS) into separate date items of their own.为了增加复杂性，理想情况下，我需要能够将列出的每个项目（例如 HTML、CSS）分离成它们自己的单独日期项目。

I'm at a loss on how to read this text & extract out the data I need in an organized way.我不知道如何阅读此文本并以有组织的方式提取我需要的数据。 Any help appreciated!任何帮助表示赞赏！

Answer 1

Anything in DOM is a node. DOM 中的任何东西都是一个节点。 The labels are the b element nodes.标签是b元素节点。 And their data the text node siblings.和他们的数据文本节点兄弟。

 const snippet = (new DOMParser()).parseFromString(getHTML(), 'text/html'); const data = {}; for (const label of snippet.querySelectorAll('b')) { const name = normalizeSpace(label.textContent); let value = normalizeSpace( label.nextSibling.textContent.replace(/^:/, '') ); if (name === 'Skills') { value = value.split(/\s*,\s*/); } data[name] = value; } console.log(data); function normalizeSpace(value) { return value.replace(/\s{2,}/g, ' ').trim(); } function getHTML(){ return `We are looking for a developer with capabilities as a Wordpress Frontend/Backend Developer&nbsp;or&nbsp;Full Stack Wordpress Developer. <br /><br /> It is important for us to have experience with hosting, SSL, and&nbsp;Pagebuilders&nbsp;(Elementor/Visual Composer).<br /><br /><b>Hourly Range</b>: $20.00-$45.00 <br /><b>Posted On</b>: December 16, 2020 23:12 UTC <br /><b>Category</b>: Full Stack Development<br /><b>Skills</b>:Website Development, API, Website Redesign, WordPress Plugin, Website Optimization, Google Analytics, Java, JavaScript, PHP, Ruby, Scala, Kotlin, Python, SQL, Very Small (1-9 employees), CSS, Website Security, HTML, Graphic Design, Web Design, jQuery, Adobe Photoshop, Adobe Illustrator <br /><b>Location Requirement</b>: Only freelancers located in the United States may apply. <br /><b>Country</b>: United States <br /> <a href="https://www.upwork.com/jobs/Ongoing-Website-development-specialist_%7E018e7e903a64f4e78e?source=rss">click to apply</a>`; }

从 RSS 源 XML 中提取文本标签（使用 Javascript/React）

问题描述

1 个解决方案

解决方案1
0 2020-12-18 18:42:19

从 RSS 源 XML 中提取文本标签（使用 Javascript/React）

问题描述

1 个解决方案

解决方案1 0 2020-12-18 18:42:19

解决方案1
0 2020-12-18 18:42:19