简体   繁体   English

使用 py2neo 从 XML 加载数据到 neo4j

[英]Loading data to neo4j from XML using py2neo

Im trying to load data to neo4j db from xml file using py2neo我正在尝试使用 py2neo 从 xml 文件将数据加载到 neo4j db

this python script works fine but its too slow since Im adding the nodes first then the relationships with two exceptions handlers.这个 python 脚本工作正常,但它太慢了,因为我先添加节点,然后是与两个异常处理程序的关系。 besides that the XML file size is around 200MB.此外,XML 文件大小约为 200MB。

Im wondering if there is faster way to perform this task?我想知道是否有更快的方法来执行此任务?

XML file: XML文件:

<Persons>
    <person>
        <id>XA123</id>
        <first_name>Adam</first_name>
        <last_name>John</last_name>
        <phone>01-12322222</phone>
    </person>
    <person>
        <id>XA7777</id>
        <first_name>Anna</first_name>
        <last_name>Watson</last_name>
        <relationship>
            <type>Friends</type>
            <to>XA123</to>
        </relationship>
    </person>
</Persons>

python script:蟒蛇脚本:

#!/usr/bin/python3

from xml.dom import minidom
from py2neo import Graph, Node, Relationship, authenticate


graph = Graph("http://localhost:7474/db/data/")
authenticate("localhost:7474", "neo4j", "admin")

xml_file = open("data.xml")
xml_doc = minidom.parse(xml_file)
persons = xml_doc.getElementsByTagName('person')

# Adding Nodes
for person in persons:
    ID_ = person.getElementsByTagName('id')[0].firstChild.data
    fName = person.getElementsByTagName('first_name')[0].firstChild.data
    lName = person.getElementsByTagName('last_name')[0].firstChild.data

    # not every person has phone number
    try:
        phone = person.getElementsByTagName('phone')[0].firstChild.data
    except IndexError:
        phone = "None"

    label = "Person"
    node = Node(label, ID=ID_, LastName=fName, FirstName=lName, Phone=phone)
    graph.create(node)


# Adding Relationships
for person in persons:
    ID_ = person.getElementsByTagName('id')[0].firstChild.data

    label = "Person"
    node1 = graph.find_one(label, property_key="ID", property_value=ID_)

    # relationships
    try:
        has_relations = person.getElementsByTagName('relationship')
        for relation in has_relations:
            node2 = graph.find_one(label,
                                   property_key="ID",
                                   property_value=relation.getElementsByTagName('to')[0].firstChild.data)

            relationship = Relationship(node1,
                                        relation.getElementsByTagName('type')[0].firstChild.data, node2)
            graph.create(relationship)
    except IndexError:
        continue

通过对特定标签使用独特的属性约束,将数据加载到 neo4j 所需的时间显着减少。

graph.cypher.execute("CREATE CONSTRAINT ON (n:Person) ASSERT n.ID IS UNIQUE")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM