简体   繁体   中英

php extract info from a html page

I have this code

<input type=hidden name="code1" value="AA-T5301">
        <td align=left valign=middle class="stdtext">
        <td valign=middle align=left class="stdtext">
            <a onMouseOver="window.status='See the more info on '; return true"
                Grapeseed Oil 150ml
        <td valign=middle align=right class="stdtext">Order Now</td>
        <td valign=middle align=right class="stdtext">
            <font class="productsale">
        <td valign=middle align=right class="stdtext">
            <input type=text size=4 name="qty_AA-T5301" value="0">
    <input type=hidden name="code2" value="AA-T5302">
            <td align=left valign=middle class="stdtext">
            <td valign=middle align=left class="stdtext">
                <a onMouseOver="window.status='See the more info on '; return true"
                    Grapeseed Oil 500ml
            <td valign=middle align=right class="stdtext">Order Now</td>
            <td valign=middle align=right class="stdtext">
                <font class="productsale">
            <td valign=middle align=right class="stdtext">
                <input type=text size=4 name="qty_AA-T5302" value="0">
        <input type=hidden name="code3" value="AA-T530">
                <td align=left valign=middle class="stdtext">
                <td valign=middle align=left class="stdtext">
                    <a onMouseOver="window.status='See the more info on '; return true"
                        Grapeseed Oil 50ml
                <td valign=middle align=right class="stdtext">Out of Stock</td>
                <td valign=middle align=right class="stdtext">
                    <font class="productsale">
                <td valign=middle align=right class="stdtext">
                    <input type=text size=4 name="qty_AA-T530" value="0">

How can i extract the info into an array so i have something like this..




Note: There maybe more than 3 items on a page at a time or there may only be 1

You can try and use the DOMDocument class, but you have to fix your html. You have end link tags ( </a> ) without any start link tags ( <a href=""> )

 $text = '<input type=hidden name="code1" value="AA-T5301">
        <td align=left valign=middle class="stdtext">
        <td valign=middle align=left class="stdtext">
            <a onMouseOver="window.status=\'See the more info on \'; return true"
                Grapeseed Oil 150ml
        <td valign=middle align=right class="stdtext">Order Now</td>
        <td valign=middle align=right class="stdtext">
            <font class="productsale">
            <span id="now">£2.04</span>
        <td valign=middle align=right class="stdtext">
            <input type=text size=4 name="qty_AA-T5301" value="0">
    <input type=hidden name="code2" value="AA-T5302">
            <td align=left valign=middle class="stdtext">
            <td valign=middle align=left class="stdtext">
                <a onMouseOver="window.status=\'See the more info on \'; return true"
                    Grapeseed Oil 500ml
            <td valign=middle align=right class="stdtext">Order Now</td>
            <td valign=middle align=right class="stdtext">
                <font class="productsale">
                <span id="now">£4.33</span>
            <td valign=middle align=right class="stdtext">
                <input type=text size=4 name="qty_AA-T5302" value="0">
        <input type=hidden name="code3" value="AA-T530">
                <td align=left valign=middle class="stdtext">
                <td valign=middle align=left class="stdtext">
                    <a onMouseOver="window.status=\'See the more info on \'; return true"
                        Grapeseed Oil 50ml
                <td valign=middle align=right class="stdtext">Out of Stock</td>
                <td valign=middle align=right class="stdtext">
                    <font class="productsale">
                    <span id="now">£1.17</span>
                <td valign=middle align=right class="stdtext">
                    <input type=text size=4 name="qty_AA-T530" value="0">

    $values = array();
    preg_match_all("#\<input type\=hidden name\=\"code[0-9]\" value\=\"(.*)\"\>#isU", $text, $values[0]);
    preg_match_all("#\<strike\>£([0-9\.]+)\<\/strike\>#isU" ,$text, $values[1]);
    preg_match_all("#\<span id\=\"now\"\>£([0-9\.]+)\<\/span\>#isU" ,$text, $values[2]);

    $product_code_array = $values[0][1];
    $RRP_array = $values[1][1];
    $price_array = $values[2][1];

Look at this code for a start:

$dom = new domDocument;
$dom->strictErrorChecking = false;
$dom->preserveWhiteSpace = true;        
foreach ($trs as $tr) 
   // now go through those elements. lookup the PHP doc on the DOM parser
   // You can iterate through sub-elements (like the 'td') just like through the 'tr'

You can also directly go for or just as you need it. Depending on the layout of your page you might want to go for a "higher" node so you know where you are based on tag parameters or the tag position.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM