简体   繁体   English

从 RSS 源 XML 中提取文本标签(使用 Javascript/React)

[英]Extract Text from RSS Feed XML in <b></b> Tags (using Javascript/React)

I've just parsed an RSS Feed (Upwork's), and I have job item data points like title, link etc parsed out as data points (items.title, items.link), however a majority of the data I need to extract about the job (its category, skills etc) is dumped in the "content" data item as one giant block of text.我刚刚解析了一个 RSS 提要(Upwork's),并且我将标题、链接等工作项目数据点解析为数据点(items.title、items.link),但是我需要提取的大部分数据工作(其类别、技能等)作为一大块文本转储在“内容”数据项中。 Generally speaking, the title of the information I need is couched in一般来说,我需要的信息的标题是tags and the information itself is just a blob of text followed by a tag.标签和信息本身只是一个文本块,后跟一个标签。

Here is an example from the XML (items.content):以下是 XML (items.content) 的示例:

We are looking for a developer with capabilities as a Wordpress Frontend/Backend Developer&nbsp;or&nbsp;Full Stack Wordpress Developer. <br /><br /> It is important for us to have experience with hosting, SSL, and&nbsp;Pagebuilders&nbsp;(Elementor/Visual Composer).<br /><br /><b>Hourly Range</b>: $20.00-$45.00 <br /><b>Posted On</b>: December 16, 2020 23:12 UTC<br /><b>Category</b>: Full Stack Development<br /><b>Skills</b>:Website Development, API, Website Redesign, WordPress Plugin, Website Optimization, Google Analytics, Java, JavaScript, PHP, Ruby, Scala, Kotlin, Python, SQL, Very Small (1-9 employees), CSS, Website Security, HTML, Graphic Design, Web Design, jQuery, Adobe Photoshop, Adobe Illustrator <br /><b>Location Requirement</b>: Only freelancers located in the United States may apply. <br /><b>Country</b>: United States <br /><a href="https://www.upwork.com/jobs/Ongoing-Website-development-specialist_%7E018e7e903a64f4e78e?source=rss">click to apply</a>

How do I pull out, for example, the label "Hourly Range" and then the data associated with it: ($20.00 - $45.00)?例如,如何提取 label“每小时范围”以及与之相关的数据:($20.00 - $45.00)? To add complexity to this, I would ideally need to be able to separate out each item listed (eg HTML, CSS) into separate date items of their own.为了增加复杂性,理想情况下,我需要能够将列出的每个项目(例如 HTML、CSS)分离成它们自己的单独日期项目。

I'm at a loss on how to read this text & extract out the data I need in an organized way.我不知道如何阅读此文本并以有组织的方式提取我需要的数据。 Any help appreciated!任何帮助表示赞赏!

Anything in DOM is a node. DOM 中的任何东西都是一个节点。 The labels are the b element nodes.标签是b元素节点。 And their data the text node siblings.和他们的数据文本节点兄弟。

 const snippet = (new DOMParser()).parseFromString(getHTML(), 'text/html'); const data = {}; for (const label of snippet.querySelectorAll('b')) { const name = normalizeSpace(label.textContent); let value = normalizeSpace( label.nextSibling.textContent.replace(/^:/, '') ); if (name === 'Skills') { value = value.split(/\s*,\s*/); } data[name] = value; } console.log(data); function normalizeSpace(value) { return value.replace(/\s{2,}/g, ' ').trim(); } function getHTML(){ return `We are looking for a developer with capabilities as a Wordpress Frontend/Backend Developer&nbsp;or&nbsp;Full Stack Wordpress Developer. <br /><br /> It is important for us to have experience with hosting, SSL, and&nbsp;Pagebuilders&nbsp;(Elementor/Visual Composer).<br /><br /><b>Hourly Range</b>: $20.00-$45.00 <br /><b>Posted On</b>: December 16, 2020 23:12 UTC <br /><b>Category</b>: Full Stack Development<br /><b>Skills</b>:Website Development, API, Website Redesign, WordPress Plugin, Website Optimization, Google Analytics, Java, JavaScript, PHP, Ruby, Scala, Kotlin, Python, SQL, Very Small (1-9 employees), CSS, Website Security, HTML, Graphic Design, Web Design, jQuery, Adobe Photoshop, Adobe Illustrator <br /><b>Location Requirement</b>: Only freelancers located in the United States may apply. <br /><b>Country</b>: United States <br /> <a href="https://www.upwork.com/jobs/Ongoing-Website-development-specialist_%7E018e7e903a64f4e78e?source=rss">click to apply</a>`; }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM