简体   繁体   中英

How to filter DBpedia results in SPARQL

I have a little problem... if I have this simple SPARQL query

SELECT ?abstract 
WHERE {
<http://dbpedia.org/resource/Mitsubishi> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), 'en')}

I have this result: SPARQL Result and it has a non-English character... is there any idea how to remove them and retrieve just English words?

You'll need to define exactly what characters you want and don't want in your result, but you can use replace to replace characters outside of a range with, eg, empty strings. If you wanted to exclude all but the Basic Latin, Latin-1 Supplement, Latin Extended-A, and Latin Extended-B ranges, (which ends up being \–\ɏ) you could do the following:

SELECT ?abstract ?cleanAbstract
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract 
  FILTER langMatches( lang(?abstract), 'en')
  bind(replace(?abstract,"[^\\x{0000}-\\x{024f}]","") as ?cleanAbstract)
}

SPARQL results

Or even simpler:

SELECT (replace(?abstract_,"[^\\x{0000}-\\x{024f}]","") as ?abstract)
WHERE {
  dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract_
  FILTER langMatches(lang(?abstract_), 'en')
}

SPARQL results

The Mitsubishi Group (, Mitsubishi Gurūpu) (also known as the Mitsubishi Group of Companies or Mitsubishi Companies) is a group of autonomous Japanese multinational companies covering a range of businesses which share the Mitsubishi brand, trademark, and legacy.The Mitsubishi group of companies form a loose entity, the Mitsubishi Keiretsu, which is often referenced in Japanese and US media and official reports; in general these companies all descend from the zaibatsu of the same name. The top 25 companies are also members of the Mitsubishi Kin'yōkai, or "Friday Club", and meet monthly. In addition the Mitsubishi.com Committee exists to facilitate communication and access of the Mitsubishi brand through a portal web site.

You may find the Latin script in Unicode Wikipedia article useful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM