繁体   English   中英

Cheerio:如何在标签中获取文本数组<td>

[英]Cheerio: How to get text array in tag <td>

HTML来源:

 <td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td> 

我想在标签td中获取文本数组。 像这样

[“ 2644-3 / 4”,“ QPSK”,“ 301-4864”]

应该使用哪种方法更好?

提前致谢!

您的HTML无法解析,因此我认为解决此问题的唯一方法是修复它,然后使用正则表达式挑选信息:

// The fixed HTML. The td is wrapped in table/tr elements
// Ideally there should be a </font> tag too but Cheerio seems to ignore that 
const html = '<table><tr><td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td></tr></table>';
const $ = cheerio.load(html);

// Grab the cell
const $td = $('td');

// (\d{4}-\d\/\d) - matches first group
// ([A-Z]{4}) - matches the second group
// (?:.*) - non-capture group
// (\d{3} - \d{4}) - matches the final group
const re = /(\d{4}-\d\/\d)([A-Z]{4})(?:.*)(\d{3} - \d{4})/;

// Match the text against the regex and remove the full match
const arr = $td.text().match(re).slice(1);

// Outputs `["2644-3/4","QPSK","301 - 4864"]`
console.log(arr);

让我们开始:

let td = '<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td>'

怎么样:

td.split('<br>').map(part => cheerio.load(part).text().trim())
// Array(3) ["2644-3/4", "QPSK", "301 - 4864"]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM