SPARQL 查询返回邻居数

Question

I need to find just the amount of neighbor (up to 4 nodes away) of a given article in DBPedia (2 articles are neighbors when there's a wikilink between them).我只需要在 DBPedia 中找到给定文章的邻居数量（最多 4 个节点）（当它们之间存在 wikilink 时，2 篇文章是邻居）。 Currently I'm doing this query but it takes a lot of time to compute:目前我正在做这个查询，但需要很多时间来计算：

SELECT COUNT(?n4)
WHERE {
    SELECT DISTINCT ?n4
    WHERE {
        <http://dbpedia.org/resource/Albert_Einstein> dbo:wikiPageWikiLink/dbo:wikiPageWikiLink/dbo:wikiPageWikiLink/dbo:wikiPageWikiLink ?n4 .
    }
}

Anyone has any idea what's a better way to do that?任何人都知道有什么更好的方法可以做到这一点？ I only need the amount of neighbors.我只需要邻居的数量。 That query only works fast till degree 2, from 3 it takes almost 30 sec to complete and 4 is almost always timeout.该查询只能快速运行到 2 级，从 3 级开始需要将近 30 秒才能完成，而 4 级几乎总是超时。

I'm using RDFLib and Python to do the query, so any trick with Python would also be helpful!我正在使用 RDFLib 和 Python 进行查询，因此任何使用 Python 的技巧也会有所帮助！

EDIT: I have already download the dataset and setup a local endpoint for the query, but the performance is still low.编辑：我已经下载了数据集并为查询设置了本地端点，但性能仍然很低。

Answer 1

If you are going to do lots of repeated queries for neighbors that are 4 steps away, you could put all the computational effort into a single, one-time equivalent property calculation:如果您要对 4 步外的邻居进行大量重复查询，您可以将所有计算工作放在一个单一的、一次性的等效属性计算中：

PREFIX ex: <http://example.com/>

CONSTRUCT {
  ?x ex:fourthNeighbour ?y .
}
WHERE {
  ?x dbo:wikiPageWikiLink/dbo:wikiPageWikiLink/dbo:wikiPageWikiLink/dbo:wikiPageWikiLink ?y .
}

This will still take a long time to run however you will only need to do it once and then any queries for 4-step neighbours will be much faster.这仍然需要很长时间才能运行，但是您只需要执行一次，然后对 4 步邻居的任何查询都会快得多。

Answer 2

SPARQL 1.1 Property Paths can have a very high time and space complexity, see the paper Counting Beyond a Yottabyte, or how SPARQL 1.1 Property Paths will Prevent Adoption of the Standard SPARQL 1.1 属性路径可能具有非常高的时间和空间复杂度，请参阅论文Counting Beyond a Yottabyte，或 SPARQL 1.1 属性路径将如何阻止标准的采用

Your query has a maximum complexity of O(n^4), where n is the number of articles in DBpedia, which is a lot.您的查询的最大复杂度为 O(n^4)，其中 n 是 DBpedia 中的文章数，这是很多。 The specific runtime depends on the network structure of the data.具体的运行时间取决于数据的网络结构。 Imagine John has 100 friends, then the friends of degree 4 can be up to (including duplicates) 100^4 = 10^8 = 100 million.想象John有100个朋友，那么度数为4的朋友可以达到（包括重复）100^4 = 10^8 = 1亿。

Additionally, RDFLib has a very low performance in my testing in comparison to a dedicated triple store such as Virtuoso Opensource 7.此外，在我的测试中，与 Virtuoso Opensource 7 等专用三重存储库相比，RDFLib 的性能非常低。

However if even that is not enough you could try dedicated graph theory tools and libraries, like NetworkX, Gephy and Cytoscape.然而，如果这还不够，您可以尝试专用的图论工具和库，如 NetworkX、Gephy 和 Cytoscape。 While RDF is also a graph data model, the triple stores may not be optimized for that kind of query.虽然 RDF 也是一种图数据模型，但三元组存储可能不会针对这种查询进行优化。

SPARQL 查询返回邻居数

问题描述

2 个解决方案

解决方案1
0 2021-12-22 04:56:13

解决方案2
0 2021-12-22 07:33:16

SPARQL 查询返回邻居数

问题描述

2 个解决方案

解决方案1 0 2021-12-22 04:56:13

解决方案2 0 2021-12-22 07:33:16

解决方案1
0 2021-12-22 04:56:13

解决方案2
0 2021-12-22 07:33:16