简体繁体中英

Getting a specific “page” from the Wikipedia XML dump

原文 2014-01-20 02:13:54 3 1 php/ wikipedia

OK, so this is what I need :

I have downloaded and extracted the full Wikipedia XML dump (>40GB, single XML file)
I need to retrieve one particular <page> element (eg the page for the entry "Italy")

How can I do this? (Preferably with PHP code or some existing tool)

1 answers

There is no guarantee that the full content of the page will be sequentially located, revisions might be anywhere in the same file or even in different XML files.

Please use or the web API's action=export at worst Special:Export . Not adding a link here because the output is huge.

splitting wikipedia dump file into several xml files

How to import wikipedia xml dump into mongodb?

Getting specific item from XML

Getting a specific part from XML

Importing wikipedia dump to MySql

Extracting data from Wikipedia JSON or XML with PHP

extract specific data from wikipedia api

How to get specific data from Wikipedia?

scrapr image url from wikipedia page

How to scrape the first paragraph from a wikipedia page?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question splitting wikipedia dump file into several xml files How to import wikipedia xml dump into mongodb? Getting specific item from XML Getting a specific part from XML Importing wikipedia dump to MySql Extracting data from Wikipedia JSON or XML with PHP extract specific data from wikipedia api How to get specific data from Wikipedia? scrapr image url from wikipedia page How to scrape the first paragraph from a wikipedia page?

Related Tags

Getting a specific “page” from the Wikipedia XML dump

Question

1 answers

solution1 0 2015-04-27 23:22:47

solution1
0 2015-04-27 23:22:47