I have a dataframe in pyspark with 2 columns (col1 and col2), col2 is a list of rows (dataframe is reduced on col1). Now I want to write this dataframe to neo4j using py2neo. How do I write and format the cypher query string? My query if I was to write the dataframe using spark connector looks like this -
query_sparkneo4j_connector = "MERGE (d:Node1 {Node1: event.col1}) \
FOREACH (i in event.col2 | \
CREATE (c:Node2 {Prop1: i.xx, Prop2: i.yy}) \
CREATE (c)-[:Rel1]->(d));"
I tried two approaches but they don't work -
Approach1:
query1_py2neo = '''MERGE (d:Node1 {{Node1: '{col1val}'}})
FOREACH (i in {col2val} |
CREATE (c:Node2 {{Prop1: i.xx, Prop2: i.yy}})
CREATE (c)-[:Rel1]->(d));'''
for row in df.collect():
col1_val = row["col1_name"]
col2_val = row["col2_name"] #this is a list of Row type
graph.run(query1_py2neo.format(col1val=col1_val, col2val=col2_val))
Gives the error below -
py2neo.errors.ClientError: [Statement.SyntaxError] Variable `xx` not defined (line 2, column 31 (offset: 71))
" FOREACH (i in [Row(xx='somevalue', yy='someothervalue')] |"
Approach2:
query2_py2neo = '''UNWIND $batch as row
MERGE (d:Node1 {{Node1: row.col1_name}})
FOREACH (i in row.certificates |
CREATE (c:Node2 {{Prop1: i.xx, Prop2: i.yy}})
CREATE (c)-[:Rel1]->(d));'''
graph.run(query2_py2neo, batch=df1)
Gives the error below -
TypeError: Values of type <class 'pyspark.sql.dataframe.DataFrame'> are not supported
The issue here seems to be that the parameters you are passing as col2val
are of unexpected types. If you can convert your DataFrame so that col2val
is a list of dicts you can use the following query
MERGE (d:Node1 {Node1: $col1val})
WITH d
UNWIND $col2val AS property_dict
CREATE (c:Node2)
SET c = property_dict
CREATE (c)-[:Rel1]->(d)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.