简体   繁体   English

如何使用dBpedia在ruby-on-rails应用程序上设置neo4j?

[英]How to setup neo4j with dBpedia ontop of ruby-on-rails application?

I am trying to use dBpedia with neo4j ontop of ruby on rails . 我试图使用dBpedianeo4j ruby on railsruby on rails ontop。

Assuming I have installed neo4j and downloaded one of the dBpedia datasets . 假设我已经安装了neo4j并下载了一个dBpedia数据集

How do I import the dbpedia dataset into neo4j ? 如何将dbpedia数据集导入neo4j

The simplest way to load dbpedia into Neo4j is to use the dbpedia4neo library. 将dbpedia加载到Neo4j的最简单方法是使用dbpedia4neo库。 This is a Java library, but you don't need to know any Java because all you need to do is run the executable. 这是一个Java库,但您不需要知道任何Java,因为您需要做的就是运行可执行文件。

You could rewrite this in JRuby if you want, but regular Ruby won't work because it relies on Blueprints , a Java library with no Ruby equivalent. 如果你愿意的话,你可以在JRuby中重写它,但是常规的Ruby不能工作,因为它依赖于Blueprints ,一个没有Ruby等价物的Java库。

Here are the two key files, which provide the loading procedure. 以下是两个提供加载过程的密钥文件。

  1. https://github.com/oleiade/dbpedia4neo/blob/master/src/main/java/org/acaro/dbpedia4neo/inserter/DBpediaLoader.java https://github.com/oleiade/dbpedia4neo/blob/master/src/main/java/org/acaro/dbpedia4neo/inserter/DBpediaLoader.java
  2. https://github.com/oleiade/dbpedia4neo/blob/master/src/main/java/org/acaro/dbpedia4neo/inserter/TripleHandler.java https://github.com/oleiade/dbpedia4neo/blob/master/src/main/java/org/acaro/dbpedia4neo/inserter/TripleHandler.java

Here is a description of what's involved . 以下是对所涉及内容的描述

Blueprints is translating the RDF data to a graph representation. 蓝图正在将RDF数据转换为图形表示。 To understand what's going on under the hood, see Blueprints Sail Ouplementation : 要了解幕后发生的事情,请参阅Blueprints Sail Ouplementation

After you download the dbpedia dump files, you should be able to build the dbpedia4neo Java library and run it without modifying the Java code. 下载dbpedia转储文件后,您应该能够构建dbpedia4neo Java库并在不修改Java代码的情况下运行它。

First, clone the oleiade's fork of the GitHub repository and change to the dbpedia4neo directory: 首先,克隆GitHub存储库的oleiade的fork并切换到dbpedia4neo目录:

$ git clone https://github.com/oleiade/dbpedia4neo.git
$ cd dbpedia4neo

(Oleiade's fork includes a minor Blueprints update that does sail.initialize(); See https://groups.google.com/d/msg/gremlin-users/lfpNcOwZ49Y/WI91ae-UzKQJ ). (Oleiade的分支包含一个小型蓝图更新,可以执行sail.initialize();请参阅https://groups.google.com/d/msg/gremlin-users/lfpNcOwZ49Y/WI91ae-UzKQJ )。

Before you build it, you will need to update the pom.xml to use more current Blueprints versions and the current Blueprints repository (Sonatype). 在构建之前,您需要更新pom.xml以使用更多当前的Blueprints版本和当前的Blueprints存储库(Sonatype)。

To do this, open pom.xml and at the top of the dependencies section, change all of the TinkerPop Blueprints versions from 0.6 to 0.9 . 要执行此操作,请打开pom.xml然后在dependencies部分的顶部,将所有TinkerPop Blueprints版本从0.6更改为0.9

While you are in the file, add the Sonatype repository to the repositories section at the end of the file: 当您在文件中时,将Sonatype存储库添加到文件末尾的repositories部分:

<repository>
  <id>sonatype-nexus-snapshots</id>
  <name>Sonatype Nexus Snapshots</name>
  <url>https://oss.sonatype.org/content/repositories/releases</url>
</repository>

Save the file and then build it using maven: 保存文件,然后使用maven构建它:

$ mvn clean install

This will download and install all the dependencies for you and create a jar file in the target directory. 这将为您下载并安装所有依赖项,并在target目录中创建一个jar文件。

To load dbpedia, use maven to run the executable: 要加载dbpedia,请使用maven运行可执行文件:

$ mvn exec:java \
  -Dexec.mainClass=org.acaro.dbpedia4neo.inserter.DBpediaLoader \
  -Dexec.args="/path/to/dbpedia-dump.nt"

The dbpedia dump is large so this will take a while to load. dbpedia转储很大,因此加载需要一段时间。

Now that the data is loaded, you can access the graph in one of two ways: 现在加载了数据,您可以通过以下两种方式之一访问图表:

  1. Use JRuby and the Blueprints-Neo4j API directly. 直接使用JRuby和Blueprints-Neo4j API。
  2. Use regular Ruby and the Rexster REST server, which is similar to Neo4j Server except that it supports multiple graph databases. 使用常规Ruby和Rexster REST服务器,它与Neo4j Server类似,只是它支持多个图形数据库。

For an example of how to create a Rexster client, see Bulbs, a Python framework I wrote that supports both Neo4j Server and Rexster. 有关如何创建Rexster客户端的示例,请参阅Bulbs,我编写的支持Neo4j Server和Rexster的Python框架。

Another approach to all this would be to process the dbpedia RDF dump file in Ruby, write out the nodes and relationships to a CSV file, and use the Neo4j batch importer to load it. 所有这些的另一种方法是在Ruby中处理dbpedia RDF转储文件,写出节点和与CSV文件的关系,并使用Neo4j批量导入器加载它。 But this will require that you manually translate the RDF data into Neo4j relationships. 但这需要您手动将RDF数据转换为Neo4j关系。

The way I see it, you have two options. 我看到它的方式,你有两个选择。

  1. You could either attempt to implement an approach like this one exactly, or fork the repo behind this approach (or another like it) and extend/fix it to fit your purposes. 您可以尝试完全实现类似这样的方法,或者在这种方法(或其他类似方法)后面拆分repo并扩展/修复它以适合您的目的。

  2. Do it yourself, from scratch. 从头开始自己动手。 Here's the general approach: 这是一般方法:

Parse your dbpedia dataset into a format suitable for neo4j's insertion methods. 将dbpedia数据集解析为适合neo4j插入方法的格式。 There are libraries that exist like openRDF that exist to process data. 存在像openRDF这样的库来处理数据。 Unless you plan to take the time to research which would suit your needs best, the existing solution I linked above already implements this library. 除非您打算花时间进行最适合您需求的研究,否则我上面链接的现有解决方案已经实现了这个库。

Then insert the formatted data into your neo4j db. 然后将格式化的数据插入neo4j数据库。 One method to accomplish this is through neo4j's Batch Insertion component. 实现此目的的一种方法是通过neo4j的Batch Insertion组件。 Note this facility, as they state, is intended for initial imports (as it's not thread safe and is non-transactional, in other words, not ACID-compliant). 请注意,这个工具,如他们所说,用于初始导入(因为它不是线程安全的,并且是非事务性的,换句话说,不符合ACID)。 So this really depends on your use case. 所以这真的取决于你的用例。

My 2 cents is that you use something already out there unless this functionality is the core of what you're developing. 我的2美分是你使用的东西,除非这个功能是你正在开发的核心。 As it's something that will be a pain to build, and even more a pain to build something that runs efficiently. 因为它构建起来会很痛苦,而且构建高效运行的东西更是痛苦。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM