简体   繁体   English

Neo4j 的参照完整性

[英]Referential Integrity with Neo4j

I am working on a project that uses a graph database to hold click data for a search engine.我正在开发一个项目,该项目使用图形数据库来保存搜索引擎的点击数据。 The nodes can be search terms or urls, and the edges hold a weight attribute, and a percentage of times that search led to someone clicking that URL.节点可以是搜索词或 url,边包含一个权重属性,以及搜索导致某人点击该 URL 的次数的百分比。

Number of times the URL was clicked / Number of times term was searched

My issue is that when I update the edges, the percentage will be accurate, but if I later update the search term node and the searched count changes, the edge will no longer have the correct percentage.我的问题是,当我更新边时,百分比将是准确的,但是如果我稍后更新搜索词节点并且搜索计数发生变化,则边将不再具有正确的百分比。 Is there a way in Neo4j to keep referential integrity? Neo4j 有没有办法保持参照完整性? like a foreign key type thing?像外键类型的东西?

Unfortunately no, neo4j doesn't support this.不幸的是,不,neo4j 不支持这一点。 You can still do it, with one of two methods.您仍然可以使用两种方法之一进行操作。 I'll tell you what they both are, then make a recommendation.我会告诉你它们都是什么,然后提出建议。

Relative to your relational database, I don't think you're looking for a foreign key or "referential integrity" -- I think what you're looking for is more like a trigger.相对于您的关系数据库,我认为您不是在寻找外键或“参照完整性”——我认为您正在寻找的更像是触发器。 A trigger is like a function or procedure that executes when data changes.触发器就像是在数据更改时执行的函数或过程。 In your case, it'd probably be good to have trigger functions that re-calculated all of the weight percentages on incident edges.在您的情况下,拥有重新计算事件边缘的所有重量百分比的触发功能可能会很好。

Option 1 - The capable Max De Marzi has got you covered there with a description of how you can do triggers in neo4j.选项 1 - 功能强大的 Max De Marzi为您介绍了如何在 neo4j 中执行触发器。 Spoiling the surprise, there's a TransactionEventHandler in the java API.令人惊讶的是,java API 中有一个TransactionEventHandler When the right kind of transaction comes through, you can catch that and do extra stuff.当正确的交易发生时,你可以抓住它并做额外的事情。

Option 2 - the server provides an extension/plugin mechanism so that you could write this on your own.选项 2 - 服务器提供扩展/插件机制,以便您可以自己编写。 This is a big hammer, it can do just about anything, but it's harder to wield, too.这是一把大锤子,它可以做任何事情,但也更难挥动。

I'd recommend you look into Max's post and the TransactionEventHandler .我建议您查看 Max 的帖子和TransactionEventHandler You might then implement public void afterCommit(TransactionData transactionData, Object o) .然后您可以实现public void afterCommit(TransactionData transactionData, Object o) In that method, you'd check out the transaction data to see if it was something of interest (not all transactions would be of interest).在这种方法中,您将检查交易数据以查看它是否值得关注(并非所有交易都值得关注)。 If the transaction updated a search term node or searched count changes, then I'd go do your recomputation, fix your weights, and you should be good.如果交易更​​新了搜索词节点或搜索计数发生变化,那么我会重新计算您的权重,您应该会很好。

The following info might be helpful.以下信息可能会有所帮助。

If you stored the number of clicks instead of the percentage, there is no way to get inconsistent data.如果您存储的是点击次数而不是百分比,则无法获得不一致的数据。 For example:例如:

(:Term {id: 1, nSearches: 123})-[:HAS_URL {weight: 2, nClicks: 17}]->(:Url {id: 2})

With this data model, you'd calculate the percentage whenever you needed it.使用此数据模型,您可以在需要时计算百分比。

For example, to find the 10 terms that have the highest percentage of visits to a specific URL:例如,要查找访问特定 URL 的百分比最高的 10 个词:

MATCH (term:Term)-[r:HAS_URL]->(url:Url {id: 2})
RETURN url, term
ORDER BY r.nClicks/term.nSearches DESC
LIMIT 10;

But notice that the inverse query (find the 10 URLs that have the highest percentage of visits from a specific term) does not even require that you calculate the percentage!但请注意,反向查询(查找特定术语访问百分比最高的 10 个 URL)甚至不需要您计算百分比! This is because in this case the percentages all have the same denominator.这是因为在这种情况下,百分比都具有相同的分母。 So, you can just use nClicks for sorting:因此,您可以仅使用nClicks进行排序:

MATCH (term:Term {id: 1})-[r:HAS_URL]->(url:Url)
RETURN term, url
ORDER BY r.nClicks DESC
LIMIT 10;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM