简体   繁体   中英

How to retrieve all properties of Wikidata ordered by their usage using SPARQL

I found a query retrieving all properties of Wikidata together with property id, label, description and aliases

PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX schema: <http://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX wikibase: <http://wikiba.se/ontology#>

SELECT ?p ?pt ?pLabel ?d ?aliases WHERE {
  {
    SELECT ?p ?pt ?d
              (GROUP_CONCAT(DISTINCT ?alias; separator="|") as ?aliases)
    WHERE {
      ?p wikibase:propertyType ?pt .
      OPTIONAL {?p skos:altLabel ?alias FILTER (LANG (?alias) = "en")}
      OPTIONAL {?p schema:description ?d FILTER (LANG (?d) = "en") .}
    } GROUP BY ?p ?pt ?d
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
  }
}

and a query counting properties used by items pointing to Q46 through a statement

SELECT ?property ?count
WHERE {
  SELECT ?property (COUNT(?item) AS ?count)
  WHERE {
    ?item ?statement wd:Q46 . # items pointing to Q46 through a statement
    ?property wikibase:statementProperty ?statement . # property used for that statement
  } GROUP BY ?property # count usage for each property pointing to that entity
} ORDER BY DESC(?count) # show in descending order of uses

I would combine them without depending on Q46 but I don't know exactly how.

Such SPARQL query will take too much time leading to execution time out. The alternatives are:

  1. Develop & use an application that
  1. Develop & use an application that
  • reads bzip2 dump archive as described at point 1
  • import parsed JSON data into an SQL database
  • perform SQL queries on your own database extracting valuable data
  1. Another way involving less development effort is:
  • extract Wikidata JSON dump archive (~65 GiB) resulting an ~1.4 TB json file
  • develop a small aplication that parse that type of json file using an event-driven parser
  • parse that JSON extracting valuable data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM