简体   繁体   English

基于XML文件(PHP,Jquery)的页面生成

[英]Page generation based on XML files (PHP, Jquery)

So I have XML data that I need to take and use in my PHP page, as search results. 因此,我需要在PHP页面中获取和使用XML数据作为搜索结果。
Problem - the XML is structured horribly (one "index" file with links that lead to other XML files) and the size of the whole thing is huuuge (from 1000 to 20+K XML files, 10 and more MBs in total). 问题-XML的结构非常糟糕(一个“索引”文件带有指向其他XML文件的链接),而整个文件的大小却很小(从1000到20 + K个XML文件,总共10甚至更​​多MB)。
There are lots of different tools I researched into: XMLreader, XML Parser and a bit of JQuery. 我研究了许多不同的工具:XMLreader,XML Parser和一些JQuery。 But I'm not sure which one will be better for that task. 但是我不确定哪一项会更好。
What, I think would be the best way of solving it is a facebook-style "press to load more" kind of page that loads itself, loads the "index" XML (maybe in a hidden input field or a div so it can be read by JQuery), then starts actively reading XML files that are listed in the index and generate the results dynamically on the page. 我认为,解决该问题的最佳方法是使用Facebook风格的“按一下以加载更多”类型的页面,该页面自行加载,加载“索引” XML(可能在隐藏的输入字段或div中,因此可以(由JQuery读取),然后开始主动读取索引中列出的XML文件,并在页面上动态生成结果。 And I do need all the data in some sort of memory, since I will have to do analytics on it as well. 而且我确实需要某种内存中的所有数据,因为我也必须对其进行分析。
Question: Which is better to use for it and any techniques I'd benefit from? 问题:哪种方法更好,以及我将从中受益的任何技术? Or am I looking at it from a wrong side completely? 还是我完全从错误的角度看待它?

I tried strait PHP reading using XMLreader and parser, as well as SimpleXMLElement + for=loop, but once I put a second read (from the "Index") into the equation the page just breaks from too big of loading times, and that's with 30MB/s internet. 我尝试使用XMLreader和parser以及SimpleXMLElement + for = loop进行海峡PHP读取,但是一旦我将第二次读取(来自“ Index”)放入等式中,页面就会因加载时间过长而中断,这就是30MB / s的互联网。 I don't have much experience with JQuery, so that's why I'm asking for advice. 我在JQuery方面没有太多经验,所以这就是我要寻求建议的原因。


PS I'm taking XML from http://www.clinicaltrials.gov PS我正在从http://www.clinicaltrials.gov获取XML
Example of a smaller "Index": http://www.clinicaltrials.gov/search?term=attack&count=1856&displayxml=true 较小 “索引”的示例: http : //www.clinicaltrials.gov/search?term= attack& count= 1856& displayxml=true
If you add "?displayxml=true" to each of the "url"s there it will be an XML file that I need to read. 如果将“?displayxml = true”添加到每个“ url”中,那么将需要读取一个XML文件。

What I would do is: 我要做的是:

Since the site provides some helpful query strings like &count= , take this advantage. 由于该网站提供了一些有用的查询字符串,例如&count= ,因此可以利用这一优势。

This means you really don't need to process and query tens of thousands of rows. 这意味着您真的不需要处理和查询成千上万的行。

So normally, you just query the external site like this: 因此,通常,您只需要像这样查询外部站点:

http://www.clinicaltrials.gov/search?term=heart%20attack&count=10&displayxml=true&pg=1

So just limit every request. 因此,请限制每个请求。 Example: 10 at a time. 示例:一次10个。

Then start building the server side. 然后开始构建服务器端。

The client site is up to you to decide, this is just personal preference but I would use DataTables in this example. 客户站点由您决定,这只是个人喜好,但在此示例中我将使用DataTables

The code below just recreates the same structure in the sample url above: 下面的代码只是在上面的示例URL中重新创建了相同的结构:

$search_term = 'attack';
$count = 10;
$query = http_build_query(array(
    'term' => $search_term,
    'count' => $count,
    'displayxml' => 'true',
    'pg' => $draw,
));
$main_url = 'http://www.clinicaltrials.gov/search?' . $query;

After building the correct URL, just request the XML needed. 构建正确的URL后,只需请求所需的XML。 An then ultimately, after you have gathered all the data you need (the chunked data). 然后,最终,在您收集了所有需要的数据(分块数据)之后。 Present it on the client-side. 呈现在客户端。

Full example: Also Sample Fiddle 完整示例:还包括Sample Fiddle

index.php

if($_SERVER['REQUEST_METHOD'] == 'POST') {
    $draw = isset($_POST['draw']) ? $_POST['draw'] : 1;
    $search_term = 'attack';
    $count = 10;
    $query = http_build_query(array(
        'term' => $search_term,
        'count' => $count,
        'displayxml' => 'true',
        'pg' => $draw,
    ));
    $main_url = 'http://www.clinicaltrials.gov/search?' . $query;
    $contents = file_get_contents($main_url);
    $xml = simplexml_load_string($contents);
    $total_results = (string) $xml->attributes()['count'];
    $data = array();

    $data['draw'] = $draw;
    $data['recordsTotal'] = $total_results;
    $data['recordsFiltered'] = $total_results;
    foreach($xml->clinical_study as $entry) {
        $data['data'][] = json_decode(json_encode($entry), true);

    }

    echo json_encode($data);
    exit;
}

?>

<link rel="stylesheet" type="text/css" href="http://cdn.datatables.net/1.10.2/css/jquery.dataTables.css" />
<table border="1" class="display dataTable" cellspacing="0" width="100%">
    <thead>
        <tr>
            <th>Order</th>
            <th>Score</th>
            <th>Nct ID</th>
            <th>URL</th>
            <th>Title</th>
            <th>Status</th>
            <th>Condition Summary</th>
            <th>Last Changed</th>
        </tr>
    </thead>
    <tfoot>
        <tr>
            <th>Order</th>
            <th>Score</th>
            <th>Nct ID</th>
            <th>URL</th>
            <th>Title</th>
            <th>Status</th>
            <th>Condition Summary</th>
            <th>Last Changed</th>
        </tr>
    </tfoot>
</table>

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<script src="http://cdn.datatables.net/1.10.2/js/jquery.dataTables.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){

    $('.display').dataTable({
        'processing': true,
        'serverSide': true,
        'ajax': {
            'url': document.URL,
            'type': 'POST',
        },
        "columns": [
            { "data": "order" },
            { "data": "score" },
            { "data": "nct_id" },
            { "data": "url" },
            { "data": "title" },
            { "data": "status" },
            { "data": "condition_summary" },
            { "data": "last_changed" },
        ],
        bFilter: false, bInfo: false, bSort: false,
    });

});
</script>

So the basic idea here really is that you really don't need to request that thousand rows immediately. 因此,这里的基本思想实际上是您真的不需要立即请求一千行。 You can just call them in chunks instead. 您可以只分块调用它们。

You can use XSL to take the clinicaltrials.gov XML and convert it into a sane XML format, including HTML. 您可以使用XSL取得Clinicaltrials.gov XML并将其转换为合理的XML格式,包括HTML。 XSL is a language for transforming XML. XSL是用于转换XML的语言。

PHP even has a built in XSL processor: http://php.net/manual/en/book.xsl.php PHP甚至具有内置的XSL处理器: http : //php.net/manual/zh/book.xsl.php

On a side note, I use XSL to convert DocBook XML files (a semantic markup language) into Twitter Bootstrap HTML. 附带说明一下,我使用XSL将DocBook XML文件(一种语义标记语言)转换为Twitter Bootstrap HTML。

For example, using the example you've provided ( http://www.clinicaltrials.gov/search?term=attack&count=1856&displayxml=true ), if you wanted to display the titles of all of the clinical studies as a list, the following XSL stylesheet would do the job: 例如,使用您提供的示例( http://www.clinicaltrials.gov/search?term=attack&count=1856&displayxml=true ),如果您想将所有临床研究的标题显示为列表,以下XSL样式表可以完成此工作:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <ul>
      <xsl:apply-templates/>
    </ul>
  </xsl:template>
  <xsl:template match="search_results">
    <xsl:apply-templates/>
  </xsl:template>
  <xsl:template match="clinical_study">
    <li><xsl:value-of select="title"/></li>
  </xsl:template>
</xsl:stylesheet>

The XSL stylesheet enters the source XML document at the root of the document. XSL样式表在文档的根目录输入源XML文档。 It then traverses the tree. 然后,它遍历树。 Anytime that it finds an element which matches a defined template, it executes that template. 只要找到与定义的模板匹配的元素,就会执行该模板。 Pretty cool stuff! 很酷的东西! It takes a while to orient yourself into the XSL paradigm of programming, but it is quite powerful once you get the hang of it. 使自己适应XSL编程范式需要花费一些时间,但是一旦掌握了它,它就会非常强大。

Note that I just wrote that as a toy example off the top of my head. 请注意,我只是在脑海中写下了一个玩具示例。 I'm not sure if that will actually execute properly. 我不确定这是否会正确执行。

Edit 1: 编辑1:

(OP asks about performing analysis, eg counting all elements of a specific type) (OP询问执行分析,例如计算特定类型的所有元素)

Looking at your example XML results, it looks like the only way to determine if a trial is in Phase 3 is to check the text of the <title> element. 查看您的示例XML结果,似乎确定审判是否处于阶段3的唯一方法是检查<title>元素的文本。 This is still easily within the capabilities of XSL (with some help from XPath). 这仍然很容易在XSL的能力范围内(在XPath的帮助下)。

<xsl:variable name="countPhase3">
   <xsl:value-of select="count(//title[text() = 'Phase 3' | text() = 'Phase III']"/>
</xsl:variable>

I'll warn you again that this is just an example off the top of my head. 我会再次警告您,这只是我脑海中的一个例子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM