简体   繁体   中英

import the content of a form in wikipedia

我想从此页面导入全名( http://nl.wikipedia.org/w/index.php?title=Samenstelling_Tweede_Kamer_2012-heden&action=edit&section=1 )(从表单中),然后将其与此名称进行比较页面( http://nl.wikipedia.org/wiki/Samenstelling_Tweede_Kamer_2012-heden )和与php相关的打印输出链接

You have to write some code to parse the HTML from the Wikipedia site. The PHP Simple HTML DOM Parser is the way to go to parse the HTML and get the information you need. Once you have your data from the Wikipedia pages, you can compare them in your code.

Example to get the names (not tested, you probably need some more selectors to get exactly what you want):

ini_set('memory_limit','160M');
require('simple_html_dom.php');
// Create DOM from URL or file

$url = 'http://nl.wikipedia.org/wiki/Samenstelling_Tweede_Kamer_2012-heden';

// Object oriented style
$html = new simple_html_dom();
$html->load_file($url);

// Procedural style
// $html = file_get_html($url);

$items = array();
// Find div with class editmode and loop through it.


foreach($html->find('div.editmode') as $article) {
        // Get all anchors in a unordened list with a list tag

        foreach($article->find('ul li a') as $a)
            $items[] = "<a href='". $a->href . "'>" . $a->plaintext . "</a>";

}


print_r($items);

If you see some weird characters in names ( André Bosman for example), you should consider defining your charset (to UTF-8) in your html like this:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM