简体   繁体   English

使用XPATH节点从产品获取数据并将其插入表中

[英]Get data from product using XPATH nodes and insert them into table

I'm trying to get product data from the external web site and insert them in special table - every found node element need to be imported in appropriate column for the product in the product table! 我正在尝试从外部网站获取产品数据并将其插入特殊表中-每个找到的节点元素都需要导入到产品表中产品的相应列中!

It's working fine for finding 1 product attribute and inserting it into table: 找到1个产品属性并将其插入表中可以正常工作:

$product_names = $xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' product_description ')]/div/h3/a");
        if (!is_null($product_names)) {
            foreach ($product_names as $product_name) {
                $nodes = $product_name->childNodes;
                foreach ($nodes as $node) {
                    $import_product = 'INSERT INTO product_table (id, product_name) values ("","' . preg_replace('~\\s+\\S+$~', "", strip_tags(trim($node->nodeValue))) . '")';
                    mysql_query($import_supralift_name);
                }
            }
        }

but products have many attributes, so, I'm trying to get this product attribute (which is in 1 html element, so I need to split it in array an use for different attributes): 但是产品具有许多属性,因此,我尝试获取此产品属性(在1个html元素中,因此我需要将其拆分成数组以用于不同的属性):

$types = $xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' product_description ')]/div/a/p");
        if (!is_null($types)) {
            foreach ($types as $type) {
                $nodes = $type->childNodes;
                foreach ($nodes as $node) {
                    list($typee,$power_unit) = explode(' / ', $node->nodeValue);
                    $import_type = 'INSERT INTO product_table (id, type, power_unit) values ("", "' . strip_tags(trim($typee)) . '", "' . strip_tags(trim($power_unit)) . '")';
                    mysql_query($import_type);
                }
            }
        }

In short - I need to get 3 product attributes (of course, they're more, just want to figure out what would be the best solution to get it working) from external web site and insert it into my data base like: 简而言之-我需要从外部网站获取3个产品属性(当然,它们更多,只是想弄清楚什么是使其工作最佳的解决方案),并将其插入我的数据库中,例如:

product_name_1 product_type_1 $power_unit_1
...
product_name_X product_type_X $power_unit_X

So far I tried to put second xpath part in first foreach, but it does not work as need... Should I try to make array with xpath nodes (like $prodcuts=array(firstXpathNode, secondXpathNode etc..) and work in such way or there is better and more correct solution for this? 到目前为止,我尝试将第二个xpath部分放在第一个foreach中,但是它并不能按需工作...我应该尝试使用xpath节点(例如$ prodcuts = array(firstXpathNode,secondXpathNode等)来制作数组,并在其中工作方式还是有更好,更正确的解决方案?

In advance - TXN for any tips... 提前-TXN了解任何提示...

EDITED: Here are sample HTML from what I'm trying to get the data, this is for the product (each product has this html for displaying data): 编辑:这是我试图获取数据的示例HTML,这是针对该产品的(每个产品都有用于显示数据的html):

<div class="single_product">
    <div data-section="featured_image">
        <a title="Unique_String" href="#">
            <div style="" data-section="image" class="image_in_fixed_ratio_wrapper">
                <div class="inner visible">
                    <img alt="Unique_String" src="image1.jpg" class="" style="">
                </div>
            </div>
        </a>
    </div>
    <div data-section="data">
        <div class="product_description">
            <div data-field="description_detail">
                <h3><a title="Unique_String" href="#">Product Name<div class="donotwantthistoinclude">New</div></a></h3>
                <a title="Unique_String" href="#"><p>Product Type / Product Power Unit</p></a>
                <div data-field="price">
                    <a title="Unique_String" href="#">5,000</a>
                </div>
                <div data-field="description">
                    <a title="Unique_String" href="#">
                        <span>Height (mm)</span> 2344
                 |
                                <span>Other attribute 1</span> Duplex
                 |
                                <span>Other attribute 2 (kg)</span>  1400
                 |
                                <span>Other attribute 3</span> 2014

                                 | <span>Other attribute X (h)</span> 772
                        <br><span>Location</span> D - 85716
                    </a>
                </div>
            </div>
        </div>
    </div>
</div>

If you separate out the product name in the first foreach into a variable, you can build a relative XPATH based on the product name. 如果将第一个foreach的产品名称分隔为变量,则可以基于产品名称构建相对的XPATH。 I'm assuming the product names are unique on the page. 我假设产品名称在页面上是唯一的。 Then the second XPATH finds the product name on the page and walks a bit further down the elements. 然后,第二个XPATH在页面上找到产品名称,然后在元素上走得更远。 Now, there will guaranteed be better XPATH queries to write to do that, I just haven't got that skill level myself, but I do give you one way to do it. 现在,可以保证会有更好的XPATH查询来编写以实现此目的,我自己还没有那种技能,但是我为您提供了一种方法。

The flow will therefore be something like: 因此,流程将类似于:

for each product, get the name, insert name into new query to get that particular product's type and power unit, parse the variables, insert into DB. 对于每个产品,获取名称,在新查询中插入名称,以获取该特定产品的类型和功率单位,解析变量,然后插入数据库。

WARNING 警告

You are using dangerous and outdated SQL. 您正在使用危险且过时的SQL。 Please use the newer mysqli_* or PDO libraries to access the database using prepared statements. 请使用较新的mysqli_ *或PDO库通过准备好的语句访问数据库。 I did NOT update your code to reflect that, it's easy to Google. 我没有更新您的代码来反映这一点,对Google来说很容易。

I did however insert product_name in your existing SQL to illustrate how all 3 fields are gathered. 但是,我确实在您现有的SQL中插入了product_name ,以说明如何收集所有3个字段。

$product_names = $xpath->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' product_description ')]/div/h3/a");
if (!is_null($product_names)) {
    foreach ($product_names as $product_name) {
        $nodes = $product_name->childNodes;
        foreach ($nodes as $node) {
            $productName = preg_replace('~\\s+\\S+$~', "", strip_tags(trim($node->nodeValue)));
            $xpath_relative = sprintf("//div[contains(concat(' ', normalize-space(@class), ' '), ' product_description ')]/div/h3/a[contains(text(),'%s')]/../../a/p",$productName);

            $types = $xpath->query($xpath_relative);
            if (!is_null($types)) {
                foreach ($types as $type) {
                    $types_nodes = $type->childNodes;
                    foreach ($types_nodes as $type_node) {
                        list($typee,$power_unit) = explode(' \'', $type_node->nodeValue);

                        // WARNING!!! SQL INJECTION BELOW!!!
                        $import_type = 'INSERT INTO product_table (id, type, power_unit, product_name) values ("", "' . strip_tags(trim($typee)) . '", "' . strip_tags(trim($power_unit)) . '", "' . $product_name . '")';
                        mysql_query($import_type);
                    }
                }
            }
        }
    }
}

EDIT #2 编辑#2

I have taken your code and run with it in a PHP Fiddle with the following result. 我已将您的代码并在PHP Fiddle中与它一起运行,结果如下。 I've also optimised the XPATH queries based on the provided structure, as well as provide a suggestion on using PDO. 我还根据提供的结构优化了XPATH查询,并提供了有关使用PDO的建议。 Just fill in more attributes as needed. 只需根据需要填写更多属性。 I'll leave you with the entire code, including the DOM and XPATH initialisation I have used so you can fiddle with it yourself. 我将为您提供完整的代码,包括我使用的DOM和XPATH初始化,以便您自己动手。

<pre><?php

$domDoc = <<<EOF
<div class="single_product">
    <div data-section="featured_image">
        <a title="Unique_String" href="#">
            <div style="" data-section="image" class="image_in_fixed_ratio_wrapper">
                <div class="inner visible">
                    <img alt="Unique_String" src="image1.jpg" class="" style="" />
                </div>
            </div>
        </a>
    </div>
    <div data-section="data">
        <div class="product_description">
            <div data-field="description_detail">
                <h3><a title="Unique_String" href="#">Product Name<div class="donotwantthistoinclude">New</div></a></h3>
                <a title="Unique_String" href="#"><p>Product Type / Product Power Unit</p></a>
                <div data-field="price">
                    <a title="Unique_String" href="#">5,000</a>
                </div>
                <div data-field="description">
                    <a title="Unique_String" href="#">
                        <span>Height (mm)</span> 2344
                 |
                                <span>Other attribute 1</span> Duplex
                 |
                                <span>Other attribute 2 (kg)</span>  1400
                 |
                                <span>Other attribute 3</span> 2014

                                 | <span>Other attribute X (h)</span> 772
                        <br /><span>Location</span> D - 85716
                    </a>
                </div>
            </div>
        </div>
    </div>
</div>
EOF;
$dom = new DomDocument();
$dom->loadXML($domDoc);
$xpath = new DomXPath($dom);

$products = [];

$productUniqueQuery = "//div[@data-field='description_detail']/h3/a/@title";

$productUniqueNodes = $xpath->query($productUniqueQuery);
if (!is_null($productUniqueNodes)) {
    foreach ($productUniqueNodes as $productUniqueNode) {
        $product = [];
        $product["unique"] = $productUniqueNode->nodeValue;

        $productNameQuery = sprintf("//h3/a[@title='%s']/text()",$product["unique"]);
        $productNameNodes = $xpath->query($productNameQuery);
        $product["name"] = $productNameNodes[0]->nodeValue;

        $productImageQuery = sprintf("//img[@alt='%s']/@src",$product["unique"]);
        $productImageNodes = $xpath->query($productImageQuery);
        $product["imageURL"] = $productImageNodes[0]->nodeValue;

        $productTypeQuery = sprintf("//a[@title='%s']/p/text()",$product["unique"]);
        $productTypeNodes = $xpath->query($productTypeQuery);
        list($product["type"], $product["powerUnit"]) = explode(" / ", $productTypeNodes[0]->nodeValue);

        $productDescriptionQuery = sprintf("//div[@data-field='description']/a[@title='%s']/child::node()",$product["unique"]);
        $productDescriptionNodes = $xpath->query($productDescriptionQuery);
        $description = "";
        foreach ($productDescriptionNodes as $productDescriptionNode) {
            $nodeText = preg_replace("/\s*\|/","",trim($productDescriptionNode->nodeValue));
            if($nodeText == "" || $productDescriptionNode->nodeType === 3){
                continue;
            }

            $product[$nodeText] = preg_replace("/\s*\|/","",trim($productDescriptionNode->nextSibling->nodeValue));
        }
        $products[$product["unique"]] = $product;
    }
}


try {
    $db = new PDO("mysql:host=HOST;dbname=DBNAME;port=3306","USERNAME", "PASSWORD");
}
catch(PDOException $e){
    echo "Connection failed: " . $e->getMessage();
    exit();
}

$sql = 'INSERT INTO product_table (unique, name, type, power_unit, attr1) values (:unique, :name, :type, :power_unit, :attr1)';
$stmt = $db->prepare($sql);

foreach($products as $product){
    $params = [
        ":unique"=>$product["unique"],
        ":name"=>$product["name"],
        ":type"=>$product["type"],
        ":power_unit"=>$product["powerUnit"],
        ":attr1"=>$product["Other attribute 1"]
    ];
    var_dump($product);
    $stmt->execute($params);
}

?>
</pre>

One thing you can do to make it easier, is when using XPath, you can use one node as the context of further searches, so once you have a list of product nodes, use this as a point at which you extract the other data. 您可以做的一件简单的事,就是在使用XPath时,可以将一个节点用作进一步搜索的上下文,因此,一旦有了产品节点列表,就可以以此为起点提取其他数据。

Just as an example... 举个例子...

$dom = new DomDocument();
$dom->loadXML($xml);
$xpath = new DomXPath($dom);

$products = [];

$data = $xpath->query("//div[@class='single_product']");
foreach ($data as $item) {
$name = $xpath->evaluate('string(descendant::div[@data-field="description_detail"]/h3/a/@title)'
        ,$item);

$imageName =  $xpath->evaluate('string(descendant::div[@data-section="featured_image"]//img/@src)'
        ,$item);
$typePower = $xpath->evaluate('string(descendant::div[@data-field="description_detail"]/a/p/text())'
        ,$item);
$description = $xpath->evaluate('string(descendant::div[@data-field="description"]/a)'
        ,$item);

    $products[$name] = array( "image" => $imageName,
            "typePower" => $typePower,
            "description" => $description
    );
}

print_r($products);

Note the second parameter to the evaluate() method which is the node from the first query() . 注意evaluate()方法的第二个参数,它是第一个query()的节点。

I've also used evaluate which allows me to return the node as a string straight away without any further conversion (it allows me to use string() as part of the query). 我还使用了evaluate ,使我可以立即将节点作为字符串返回,而无需任何进一步的转换(它允许我将string()用作查询的一部分)。

There is no post processing, so you may have to tidy up some of the data and there is no database access (you should follow the examples of using prepared statements), but this shows the important part of extracting the data in the first place. 没有后处理,因此您可能必须整理一些数据并且没有数据库访问权限(您应遵循使用准备好的语句的示例),但这首先显示了提取数据的重要部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Xpath和PHP从表中收集数据? - Scraping data from the table using Xpath and PHP? 使用 Xpath 从带有命名空间的 XML 获取数据 - Using Xpath to get Data from XML with Namespace xPath从表中获取值 - xPath to get values from table 从 mysql 表中获取数据并在 sql 查询中的“IN”中使用它们 (PHP) - Get data from mysql table and use them in "IN" in sql query (PHP) 如何从两个 API 调用中获取数据,将它们合并并与我的其他数据一起显示在表格中? - How to get data from two API calls, merge them and display them in a table with my other data? MySQL-从表中获取列名并显示它们(使用PHP) - MySQL - Get column names from table and display them (using PHP) 使用Symfony将数据从一个表插入到另一个表 - Insert data from one table to another table using Symfony 如何使用php将一个表中的选定数据插入另一个表? - How to insert selected data from a table into another table using php? 使用PHP从文件夹中获取图片名称-然后在其中查找数字,然后分别为每个数字插入MySQL表 - Get pic names from folders with PHP - then look for numbers in them, then insert into MySQL table for each number respectively Ajax使用POST方法发送数据,但PHP函数没有将它们插入到表中 - Ajax send data using POST method, but PHP function does not INSERT them to the table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM