[英]parsing html text with regex in javascript?
I realize HTML can not be parsed with regex. 我意识到HTML不能用正则表达式解析。 However, I have a string with some source code from a typical amazon web page.
但是,我有一个字符串,其中包含来自典型亚马逊网站的一些源代码。
<script type="text/javascript">
P.when("A", "jQuery").execute(function(A, $) {
var pageState = A.state('ftPageState');
if (typeof pageState === 'undefined') {
pageState = {};
}
if (pageState["fast-track-message"]) {
pageState["fast-track-message"].stopTimer();
}
<li> 48 pages</li>
pageState["fast-track-message"] = new fastTrackCountDown(20710,"fast-track-message");
A.state('ftPageState', pageState);
});
</script>
I want to grab the 48. Every number will be followed by pages</li>
How can I match this? 我想抓住48个数字。每个数字后面都有
pages</li>
如何匹配?
var string_tester = String(datastuff.html());
var regex_tester = string_tester.match(/\d+ pages<\/li>/);
If you know it will always be in the list element, try this: (<li>\\s*)([0-9]+)(\\s*pages\\s*</li>)
(48 would be in $2
. However, that won't accommodate number formatting. This should be generic enough: (<li>\\s*)([0-9,\\.\\-\\(\\)]+)(\\s*pages\\s*</li>)
. I should note that amazon has a seller and publisher API that might provide a more stable route for you to pursue depending on your use case. 如果您知道它将始终位于列表元素中,请尝试以下操作:
(<li>\\s*)([0-9]+)(\\s*pages\\s*</li>)
(48将在$2
但是,这将不能容纳数字格式,这应该足够通用: (<li>\\s*)([0-9,\\.\\-\\(\\)]+)(\\s*pages\\s*</li>)
。我应该注意,亚马逊有一个卖方和发行者API,根据您的用例,它们可能会为您提供更稳定的途径。
Edit: I checked a few Amazon pages to see if there was a better approach to getting what you want and noticed that for the pages I checked there was no number, just this: 编辑:我检查了几个亚马逊页面,看是否有更好的方法来获取您想要的东西,并注意到对于我检查的页面没有编号,仅此而已:
<script type="text/javascript">
P.when("A", "jQuery").execute(function(A, $) {
var pageState = A.state('ftPageState');
if (typeof pageState === 'undefined') {
pageState = {};
}
if (pageState["fast-track-message"]) {
pageState["fast-track-message"].stopTimer();
}
pageState["fast-track-message"] = new fastTrackCountDown(57592,"fast-track-message");
A.state('ftPageState', pageState);
});
</script>
I don't know what you are doing, but I wanted to mention that in case it invalidates an assumption you have made. 我不知道您在做什么,但是我想提一下,以防它使您所做的假设无效。
Your attempt was close! 您的尝试接近了! But returned "48 pages" instead of "48."
但是返回的是“ 48页”而不是“ 48页”。
string_tester.match(/(\\d+) pages<\\/li>/)[1];
string_tester = "testing <li> 48 pages</li> now, and also testing <li> 52 pages</li>. see?"; regex_tester = string_tester.match(/\\d+ pages<\\/li>/g) .map(function(m){ return m.match(/\\d+/)[0]; // or return m.replace(/\\D/g, ""); }); document.getElementsByTagName('p')[0].innerHTML = regex_tester;
<p></p>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.