简体繁体中英

How to get Wikipedia “clean” content?

原文 2013-04-09 18:48:03 8 2 php/ wikipedia/ wikipedia-api/ mediawiki-api

I'm using Mediawiki api in order to get content from Wikipedia pages. I've written a code which generates the next query (for example):

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles=hawaii

Which retrieves only the leading paragraph from the Wikipdia page about Hawaii.

The problem is that as you might notice there are a lot of irrelevant substrings such as:

"[[Molokai|Moloka{{okina}}i]], [[Lanai|Lāna{{okina}}i]], [[Kahoolawe|Kaho{{okina}}olawe]], [[Maui]] and the [[Hawaii (island)|" .

All those barckets [[]] are not relevant , and I wonder whether there is an alegant method to pull only 'clean' content from such pages?

Thanks in advance.

2 answers

You can get a clean HTML text from Wikipedia with this query:

https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=hawaii

If you want just a plain text, without HTML, try this:

https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=hawaii&explaintext

~~please try this:~~

~~$relevant = preg_replace('/[[.*?]]/', '', $string);~~

EDIT: just found this - hope it is helpful

How to get Wikipedia content section by section using Wikipedia API - PHP

Wikipedia API how to use it to edit content

How to get clean email content from server into the database

PHP, How do I get clean XML POST content into a variable?

How to get summary from wikipedia for different companies?

How to get the result of a complex Wikipedia template?

How to get specific data from Wikipedia?

How to get an data from Hebrew Wikipedia?

How to get results from the Wikipedia API with PHP?

Querying Content from Wikipedia

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to get Wikipedia content section by section using Wikipedia API - PHP Wikipedia API how to use it to edit content How to get clean email content from server into the database PHP, How do I get clean XML POST content into a variable? How to get summary from wikipedia for different companies? How to get the result of a complex Wikipedia template? How to get specific data from Wikipedia? How to get an data from Hebrew Wikipedia? How to get results from the Wikipedia API with PHP? Querying Content from Wikipedia

Related Tags

How to get Wikipedia “clean” content?

Question

2 answers

solution1
1 2015-11-12 10:14:01

solution2
0 2013-04-09 18:59:33

How to get Wikipedia “clean” content?

Question

2 answers

solution1 1 2015-11-12 10:14:01

solution2 0 2013-04-09 18:59:33

solution1
1 2015-11-12 10:14:01

solution2
0 2013-04-09 18:59:33