简体繁体中英

Wikipedia API returning extract without all characters in article?

原文 2017-06-24 03:36:10 5 1 java/ mediawiki/ wikipedia/ wikipedia-api/ mediawiki-api

Not sure if I should ask this here, but I can't figure it out.

I saw the issue first on Wikipedia's "Meme" article ( https://en.wikipedia.org/wiki/Meme ). There are several special characters for pronunciation that don't appear in the extract queried with the MediaWiki API ( https://en.wikipedia.org/w/api.php?format=jsonfm&action=query&prop=revisions|extracts&redirects=true&titles=meme ).

I couldn't find a solution in the MediaWiki API documentation or alternatives (I tried jsoup to parse the entire page but couldn't reliably get the content from the article that I need while the extract query does).

1 answers

The extracts API tries to sanitize the text in various ways to make it more readable (you might have noticed that the sentences in italic preceding the pronunciation do not show either). Part of that is removing everything with the noexcerpt class, which includes the spelling. (In the future, text in parantheses might be removed completely to handle metadata creep .)

jsoup - extract text from wikipedia article

Returning a specific category with Wikipedia API (Mediawiki API)

Java Extract Wikipedia Info

Extract Wikipedia Infobox data

StAX not returning all characters in a string

Regex match Wikipedia internal article links

How can I use the Wikipedia API to extract/parse the link I am looking for?

Connecting to swedish Wikipedia to extract information

android Wikipedia api game

returning a String with X repeated characters without a loop

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question jsoup - extract text from wikipedia article Returning a specific category with Wikipedia API (Mediawiki API) Java Extract Wikipedia Info Extract Wikipedia Infobox data StAX not returning all characters in a string Regex match Wikipedia internal article links How can I use the Wikipedia API to extract/parse the link I am looking for? Connecting to swedish Wikipedia to extract information android Wikipedia api game returning a String with X repeated characters without a loop

Related Tags

Wikipedia API returning extract without all characters in article?

Question

1 answers

solution1 2 2017-06-25 14:58:06

solution1
2 2017-06-25 14:58:06