简体   繁体   中英

Neo4j : Creating relationship from CSV file is really slow with py2neo

I've tried to load a CSV file (25 Mb size, 150 000 rows) which contains 22 columns into a neo4j graph using py2neo flights modelization.

The cypher query is used in one query and contains nodes and relationships creation between the nodes (Airport, City, Flight and Plane). But when running the code, it takes forever even with USING PERIODIC COMMIT.

I am not sure if the cypher query I've written is optimized, and might be the source of the slowness. For 10 000 rows, it took me around 10 minutes to build the graph... Can anyone help me please ? Here is the code :

def importFromCSVtoNeo(graph):
query = '''
    USING PERIODIC COMMIT 1000
    LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row FIELDTERMINATOR '\t' 
    WITH row 

    MERGE (c_departure:City {cityName: row.cityName_departure}) 
    MERGE (a_departure:Airport {airportName: row.airportName_departure}) 
    MERGE (f_segment1:Flight {airline: row.airline1}) 
    ON CREATE SET f_segment1.class = row.class1, 
                  f_segment1.outboundclassgroup = row.outboundclassgroup1 

    MERGE (a_departure)-[:IN]->(c_departure) 
    MERGE (c_departure)-[:HAS]->(a_departure) 
    MERGE (f_segment1)-[:FROM {departAt: row.outbounddeparttime}]->(a_departure) 

    MERGE (c_transfer:City {cityName: row.transferCityName}) 
    MERGE (a_transfer:Airport {airportName: row.airportName_transfer}) 
    MERGE (f_segment1)-[:TO_TRANSFER {transferArriveAt: row.transferArriveAt}]->(a_transfer) 
    MERGE (a_transfer)-[:IN]->(c_transfer) 
    MERGE (c_transfer)-[:HAS]->(a_transfer) 

    MERGE (c_arrival:City {cityName: row.cityName_arrival}) 
    MERGE (a_arrival:Airport {airportName: row.airportName_arrival}) 
    MERGE (f_segment2:Flight {airline: row.airline2}) 
    ON CREATE SET f_segment2.class = row.class2, 
                  f_segment2.outboundclassgroup = row.outboundclassgroup2 
    MERGE (f_segment2)-[:TO {arrivalAt: row.outboundarrivaltime}]->(a_arrival) 
    MERGE (f_segment2)-[:FROM_TRANSFER {transferDepartAt: row.transferDepartAt}]->(a_transfer) 
    MERGE (a_arrival)-[:IN]->(c_arrival) 
    MERGE (c_arrival)-[:HAS]->(a_arrival) 


    MERGE (p:Plane {saleprice: row.saleprice}) 
    ON CREATE SET p.depart = row.cityName_departure, 
                  p.destination = row.cityName_arrival, 
                  p.salechannel = row.salechannel, 
                  p.planeDuration = row.planeDuration 
    MERGE (p)-[:HAS_FLIGHTS]->(f_segment1) 
    MERGE (f_segment1)-[:WAIT_FOR {waitingTime: row.waitingTime}]->(f_segment2) 
    '''

graph.run(query)


if __name__ == '__main__':
    graph = Graph()
    importFromCSVtoNeo(graph)

I've also tried to do it in a batch mode but the performance doesn't get better... I'll appreciated any comments or suggestion. Thanks !!

I would use indices on nodes properties before launching the script, in order to let neo4j using them for fast look-up when using MERGE (since it has to MATCH nodes row by row). For instance, for the first node property I would use:

CREATE INDEX ON :City(cityname)

and so on. You can create them directly within py2neo into single run statements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM