简体   繁体   English

PHP使用SQL查询XML

[英]PHP query XML with SQL

I am doing some web scraping and come across several tables of data that I want to query against. 我正在进行一些Web抓取,遇到了要查询的几个数据表。 Currently I'm up to: 目前我的工作是:

$url = 'http://finance.yahoo.com/q/op?s=QQQQ&m=2012-04';
$html = @DOMDocument::loadHTMLFile($url); 
$xml = simplexml_import_dom($html); 
$results = $xml->xpath('//table[@class="yfnc_datamodoutline1"]');
var_dump($results);

Produces results: http://pastebin.com/6p3L2Kcc 产生结果: http : //pastebin.com/6p3L2Kcc

This is well-ordered HTML table data, with TH and TD and everything. 这是带有TH和TD以及所有内容的有序HTML表数据。 I'd like to use it like this: 我想这样使用它:

$sql = 'SELECT Last,Open_Int FROM TABLE1 WHERE Last>25 AND Symbol LIKE "%C%"';
$results = $xmltable->sql($sql);
while($result = $results->fetch_assoc())
  echo $result['Last'] . " -- " . $result['Open_Int'] . "\n";

Without any creativity, I can write classes to parse that HTML table, take the first row, create a table in sqlite, select other rows and turn them into insert statements. 没有任何创造力,我可以编写类来解析该HTML表,获取第一行,在sqlite中创建表,选择其他行并将其转换为insert语句。 But, do you know a better way to do this, or is there some powerful PHP function that I'm not seeing? 但是,您知道执行此操作的更好方法吗,还是我没有看到某些强大的PHP函数?

Update : Perhaps the scope here is too big. 更新 :也许这里的范围太大。 I'd be happy with a link to a library or advice on getting an HTML table in to a (proper) XML table. 我很乐意提供一个库链接或有关将HTML表插入(适当)XML表的建议。

The answer depends on your larger needs. 答案取决于您的更大需求。 Here are three questions that can flesh those out: 这是三个可以充实这些问题的问题:

1) How often is the data read vs. written? 1)多久读取一次数据?

2) Do you keep old versions or is only the latest required? 2)您保留旧版本还是仅需要最新版本?

3) Will the data be compared to other data? 3)将数据与其他数据进行比较吗?

In one case let's say the answer to #1 is "many more reads" and the answer to #3 is "yes". 在一种情况下,假设#1的答案是“更多阅读”,而#3的答案是“是”。 In this case it might be well worthwhile to put the XML results into a SQL table for frequent and flexible querying. 在这种情况下,将XML结果放入SQL表以进行频繁且灵活的查询可能是非常值得的。

However, in another case, let's say the answer to #2 is "no" and the answer to #3 is "no" -- you just keep the latest retrieval and don't compare it to anything. 但是,在另一种情况下,假设#2的答案为“否”,而#3的答案为“否”-您只保留最新的检索结果,而不将其与任何内容进行比较。 In this case you can just stick into a file and retrieve it as needed for display (#1 becomes kind of irrelevant). 在这种情况下,您可以粘贴到一个文件中并根据需要检索该文件以进行显示(#1变得无关紧要)。

EDIT in response to question in comment: Assuming you want to put it into a database, the display you link to shows a nested set of objects/arrays. 针对评论中的问题进行编辑 :假设您要将其放入数据库中,链接到的显示将显示一组嵌套的对象/数组。 You "walk the tree" to peel out the nested objects, strip off their properties and issue individual inserts to the particular tables. 您“遍历树”以剥离嵌套对象,剥离它们的属性,并向特定表发出单独的插入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM