parsing html text with regex in javascript?

Question

I realize HTML can not be parsed with regex. However, I have a string with some source code from a typical amazon web page.

            <script type="text/javascript">
                P.when("A", "jQuery").execute(function(A, $) {
                    var pageState = A.state('ftPageState');
                    if (typeof pageState === 'undefined') {
                        pageState = {};
                    }
                    if (pageState["fast-track-message"]) {
                        pageState["fast-track-message"].stopTimer();
                    }

        <li> 48 pages</li>

                    pageState["fast-track-message"] = new fastTrackCountDown(20710,"fast-track-message");
                    A.state('ftPageState', pageState);
                });
            </script>

I want to grab the 48. Every number will be followed by pages</li> How can I match this?

Attempt

var string_tester = String(datastuff.html());
var regex_tester = string_tester.match(/\d+ pages<\/li>/);

Answer 1

If you know it will always be in the list element, try this: (<li>\\s*)([0-9]+)(\\s*pages\\s*</li>) (48 would be in $2 . However, that won't accommodate number formatting. This should be generic enough: (<li>\\s*)([0-9,\\.\\-\$\$]+)(\\s*pages\\s*</li>) . I should note that amazon has a seller and publisher API that might provide a more stable route for you to pursue depending on your use case.

Edit: I checked a few Amazon pages to see if there was a better approach to getting what you want and noticed that for the pages I checked there was no number, just this:

                <script type="text/javascript">
                P.when("A", "jQuery").execute(function(A, $) {
                    var pageState = A.state('ftPageState');
                    if (typeof pageState === 'undefined') {
                        pageState = {};
                    }
                    if (pageState["fast-track-message"]) {
                        pageState["fast-track-message"].stopTimer();
                    }
                    pageState["fast-track-message"] = new fastTrackCountDown(57592,"fast-track-message");
                    A.state('ftPageState', pageState);
                });
            </script>

I don't know what you are doing, but I wanted to mention that in case it invalidates an assumption you have made.

Answer 2

Your attempt was close! But returned "48 pages" instead of "48."

If you want to match one number per query, use
string_tester.match(/(\\d+) pages<\\/li>/)[1];
note the '(' ')' captured group
To match multiple numbers:

 string_tester = "testing <li> 48 pages</li> now, and also testing <li> 52 pages</li>. see?"; regex_tester = string_tester.match(/\\d+ pages<\\/li>/g) .map(function(m){ return m.match(/\\d+/)[0]; // or return m.replace(/\\D/g, ""); }); document.getElementsByTagName('p')[0].innerHTML = regex_tester;

 <p></p>

parsing html text with regex in javascript?

Question

Attempt

2 answers

solution1
1 ACCPTED 2016-06-14 01:38:19

solution2
1 2016-06-14 02:25:03

parsing html text with regex in javascript?

Question

Attempt

2 answers

solution1 1 ACCPTED 2016-06-14 01:38:19

solution2 1 2016-06-14 02:25:03

solution1
1 ACCPTED 2016-06-14 01:38:19

solution2
1 2016-06-14 02:25:03