简体   繁体   中英

Save DOM tree into a graph database: Connect related nodes

I'm inserting hierarchical data made of a DOM Tree into a graph database but, I'm not able to obtain the parent's ID which is needed to create a relationship between the child and its parent's id.

Below is the code that illustrates a traversing of DOM nodes, inserting the tags and obtaining the last inserted id. I need to insert and obtain both ids of the child and parent in order to create their relation.

from lxml import HTML
import age  # from AgensGraph
from age.gen.ageParser import *

GRAPH_NAME = "demo_graph"
DSN = "host=localhost port=5432 dbname=demodb user=userdemo 
password=demo234"

ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
    if parent := element.getparent():        
        parent = None
        cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
        b = [x[0].id for x in cursor]  # get last inserted ID 
        print(b[0])        
        ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join  C Connects P (P is unknown)

Here is the demo file: demo.html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8"/>
    <title>Document</title>
  </head>
  <body>
    <ul class="menu">
      <div class="itm">home</div>
      <div class="itm">About us</div>
      <div class="itm">Contact us</div>
    </ul>
    <div id="idone" class="classone">
      <li class="item1">First</li>
      <li class="item2">Second</li>
      <li class="item3">Third</li>
      <div id="innerone"><h1>This Title</h1></div>
      <div id="innertwo"><h2>Subheads</h2></div>      
    </div>
    <div id="second" class="below">
      <div class="inner">
        <h1>welcome</h1>
        <h1>another</h1>
        <h2>third</h2>
      </div>
    </div>
  </body>
</html>

Here is the extracted DOM Tree:

tag: head attrib: None parent: html
tag: meta attrib: ('charset', 'UTF-8') parent: head
tag: title attrib: None parent: head
tag: body attrib: None parent: html
tag: h1 attrib: None parent: div
tag: h1 attrib: None parent: div
tag: h2 attrib: None parent: div
/tmp/ipykernel_27254/2858024143.py:4: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
  if parent := element.getparent():

Executing CREATE statement takes effect after committing session. You should commit() after execCypher(...)

cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
b = [x[0].id for x in cursor]
ag.commit()

Try following codes:

ag = age.connect(graph=GRAPH_NAME, dsn=DSN)
tree = html.parse("demo.html")
for element in tree.getiterator():
    if parent := element.getparent():        
        parent = None
        cursor = ag.execCypher("CREATE (t:node {name: %s} ) RETURN t", params=(element.tag))        
        b = [x[0].id for x in cursor]  # get last inserted ID 
        ag.commit()
        print(b[0])        
        ag.execCypher("MATCH (c:node), (p:node) WHERE c.id = %s AND p.id = %s CREATE (a)-[r:connects}]->(b)") # Match child node 'c', parent node: p and join  C Connects P (P is unknown)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM