使用最多谓词计算DBPedia资源

Question

I thought it would be interesting to ask DBPedia which of its resources are the most predicate-rich. 我认为询问DBPedia哪个资源是最丰富的谓词会很有趣。

I tried running the following query: 我尝试运行以下查询：

SELECT DISTINCT ?s (count(?p) AS ?info)
WHERE {
  ?s ?p ?o .
}
GROUP BY ?s ?p
ORDER BY desc(?info)
LIMIT 50

and it timed out, so I can't verify whether or not it was the right query. 并且它超时了，所以我无法验证它是否是正确的查询。

So, I'm left with the following two questions: 因此，我剩下以下两个问题：

is this the correct way to ask this question? 这是问这个问题的正确方法吗？
is the query too computationally expensive to run (even on smaller datasets? DBP is 2.46b triples)? 该查询在计算上是否过于昂贵而无法运行（即使在较小的数据集上，DBP是2.46b三元组）？

Answer 1

The right way to ask this 提出这个问题的正确方法

Suppose you've got data like this: 假设您有如下数据：

@prefix : <http://stackoverflow.com/q/22391927/1281433/> .

:a :p 1, 2, 3 ;
   :q 4, 5 .

:b :p 1, 2 ;
   :q 3, 4 ;
   :r 5, 6 .

:c :p 1 ;
   :q 2 ;
   :r 3 .

Then you can ask how many triples each resource is the subject of with a query like this: 然后，您可以使用以下查询来询问每个资源要包含多少个三元组：

prefix : <http://stackoverflow.com/q/22391927/1281433/>

select ?s (count(*) as ?n) where {
  ?s ?p ?o
}
group by ?s
order by desc(?n)

----------
| s  | n |
==========
| :b | 6 |
| :a | 5 |
| :c | 3 |
----------

Notice that you only want to group by ?s if you're interested in how many triples each resource is the subject of. 请注意，如果您对每种资源的主题数是多少三倍感兴趣，则只想group by ?s 。 In you original query, where you group by ?s ?p , you're going to sorting (subject,predicate) pairs by how many values they have. 在原始查询中，将group by ?s ?p ，您将根据对具有多少个值对（主题，谓词）进行排序。 Eg, 例如，

prefix : <http://stackoverflow.com/q/22391927/1281433/>

select ?s ?p (count(*) as ?n) where {
  ?s ?p ?o
}
group by ?s ?p
order by desc(?n)

---------------
| s  | p  | n |
===============
| :a | :p | 3 |
| :b | :p | 2 |
| :a | :q | 2 |
| :b | :q | 2 |
| :b | :r | 2 |
| :c | :p | 1 |
| :c | :q | 1 |
| :c | :r | 1 |
---------------

Doing this for DBpedia 为DBpedia执行此操作

I don't expect that you'll be able to run a query like this on DBpedia. 我不希望您能够在DBpedia上运行这样的查询。 It requires touching every triple in the data, and then ordering the resources by how many triples they're the subject of. 它需要触摸数据中的每个三元组，然后按资源所属的三元组顺序对资源进行排序。 That sounds like a lot of work. 这听起来像很多工作。 You might be able to download the data, load it into a local endpoint and run the query, and so avoid the timeout, but I wouldn't be surprised if it still takes a while. 您也许可以下载数据，将其加载到本地端点并运行查询，从而避免超时，但是如果仍然需要一段时间，我也不会感到惊讶。

使用最多谓词计算DBPedia资源

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-03-14 01:38:16

The right way to ask this 提出这个问题的正确方法

Doing this for DBpedia 为DBpedia执行此操作

使用最多谓词计算DBPedia资源

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-03-14 01:38:16

The right way to ask this 提出这个问题的正确方法

Doing this for DBpedia 为DBpedia执行此操作

解决方案1
3 已采纳 2014-03-14 01:38:16