简体   繁体   English

Neo4j:使用py2neo从CSV文件创建关系确实很慢

[英]Neo4j : Creating relationship from CSV file is really slow with py2neo

I've tried to load a CSV file (25 Mb size, 150 000 rows) which contains 22 columns into a neo4j graph using py2neo flights modelization. 我试图使用py2neo flight模型化将一个包含22列的CSV文件(25 Mb,15万行)加载到neo4j图中。

The cypher query is used in one query and contains nodes and relationships creation between the nodes (Airport, City, Flight and Plane). 密码查询用于一种查询中,它包含节点和节点之间的关系创建(机场,城市,航班和飞机)。 But when running the code, it takes forever even with USING PERIODIC COMMIT. 但是,在运行代码时,即使使用PERIODIC COMMIT,也要花很多时间。

I am not sure if the cypher query I've written is optimized, and might be the source of the slowness. 我不确定我编写的密码查询是否经过优化,是否可能是速度慢的原因。 For 10 000 rows, it took me around 10 minutes to build the graph... Can anyone help me please ? 对于1万行,花了我大约10分钟的时间来建立图表...有人可以帮我吗? Here is the code : 这是代码:

def importFromCSVtoNeo(graph):
query = '''
    USING PERIODIC COMMIT 1000
    LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row FIELDTERMINATOR '\t' 
    WITH row 

    MERGE (c_departure:City {cityName: row.cityName_departure}) 
    MERGE (a_departure:Airport {airportName: row.airportName_departure}) 
    MERGE (f_segment1:Flight {airline: row.airline1}) 
    ON CREATE SET f_segment1.class = row.class1, 
                  f_segment1.outboundclassgroup = row.outboundclassgroup1 

    MERGE (a_departure)-[:IN]->(c_departure) 
    MERGE (c_departure)-[:HAS]->(a_departure) 
    MERGE (f_segment1)-[:FROM {departAt: row.outbounddeparttime}]->(a_departure) 

    MERGE (c_transfer:City {cityName: row.transferCityName}) 
    MERGE (a_transfer:Airport {airportName: row.airportName_transfer}) 
    MERGE (f_segment1)-[:TO_TRANSFER {transferArriveAt: row.transferArriveAt}]->(a_transfer) 
    MERGE (a_transfer)-[:IN]->(c_transfer) 
    MERGE (c_transfer)-[:HAS]->(a_transfer) 

    MERGE (c_arrival:City {cityName: row.cityName_arrival}) 
    MERGE (a_arrival:Airport {airportName: row.airportName_arrival}) 
    MERGE (f_segment2:Flight {airline: row.airline2}) 
    ON CREATE SET f_segment2.class = row.class2, 
                  f_segment2.outboundclassgroup = row.outboundclassgroup2 
    MERGE (f_segment2)-[:TO {arrivalAt: row.outboundarrivaltime}]->(a_arrival) 
    MERGE (f_segment2)-[:FROM_TRANSFER {transferDepartAt: row.transferDepartAt}]->(a_transfer) 
    MERGE (a_arrival)-[:IN]->(c_arrival) 
    MERGE (c_arrival)-[:HAS]->(a_arrival) 


    MERGE (p:Plane {saleprice: row.saleprice}) 
    ON CREATE SET p.depart = row.cityName_departure, 
                  p.destination = row.cityName_arrival, 
                  p.salechannel = row.salechannel, 
                  p.planeDuration = row.planeDuration 
    MERGE (p)-[:HAS_FLIGHTS]->(f_segment1) 
    MERGE (f_segment1)-[:WAIT_FOR {waitingTime: row.waitingTime}]->(f_segment2) 
    '''

graph.run(query)


if __name__ == '__main__':
    graph = Graph()
    importFromCSVtoNeo(graph)

I've also tried to do it in a batch mode but the performance doesn't get better... I'll appreciated any comments or suggestion. 我也尝试过以批处理方式进行操作,但是性能并没有得到改善……我将不胜感激。 Thanks !! 谢谢 !!

I would use indices on nodes properties before launching the script, in order to let neo4j using them for fast look-up when using MERGE (since it has to MATCH nodes row by row). 我将在启动脚本之前在节点属性上使用索引,以便在使用MERGE时让neo4j使用它们进行快速查找(因为它必须逐行匹配节点)。 For instance, for the first node property I would use: 例如,对于第一个节点属性,我将使用:

CREATE INDEX ON :City(cityname)

and so on. 等等。 You can create them directly within py2neo into single run statements. 您可以直接在py2neo中将它们创建为单运行语句。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM