简体   繁体   中英

Inserting data in html file using TCL

Here is my html file:

<head>
<title>Reading from text files</title>
</head>
<body>
<h3>Starting space</h3>
<ul>
    <li></li>
</ul>
<h3>ending space</h3>
<ul>
</body>
</html>

I want to edit this html file using tcl, and regex. But I want to edit this at specific location, that is between Starting space and ending space . In between these points, I want to add various list items.

<li> First </li>

etc.. I wrote the tcl script to open this file and tried to print out the data between these two locations so that I can later edit it. But I am not able to do that. Can you please point me where am i going wrong?

Tcl script

proc edit_html {release} {
set fp [open $release r]
set para [read -nonewline $fp]
close $fp
set line_read [regexp -nocase -lineanchor -inline -all -- {^\s*?Starting space\s*?.*?ending space} $para]
foreach line_read $line_read {
    regexp -nocase -- {^\s*?Starting space\s*?.*?ending space} $line_read - tag value
    puts $value

}
}
edit_html [lindex $argv 0]

I am not sure where am I going wrong in this regexp. And once I find the location, how should I edit it? Any headsup? Like I should bring the file pointer thr?

The first problem with your current code is that you are not modifying anything. regexp is used to read/fetch data, not make changes. You might want to use regsub instead. Now the problem is, if you want to change that from the original file, and you have many Starting space and ending space , you might want to use a function with it.

Second, your regex doesn't match. You don't have ^\\s*?Starting space but you have ^<h3>Starting space and there are other parts you need to edit in that regex too.

I've written up the below proc:

proc edit_html {release} {
  proc re_sub {block} {
    # Get the items to b replaced
    global items
    # Get the indentation and put in $spaces
    regexp -lineanchor -- {^(\s*)<li>} $block - spaces
    set html_items [list]
    foreach item $items {
      lappend html_items "<li>$item</li>"
    }
    # Create the list of items in html form with indentation
    set html_items [join $html_items "\n$spaces"]
    regsub -lineanchor -- {<li>\s*</li>} $block $html_items result
    return $result
  }
  set fp [open $release r]
  set para [read -nonewline $fp]
  close $fp
  # The command to be executed
  set cmd {[re_sub "\0"]}
  # The substitution
  set result [subst [regsub -all -- {^<h3>Starting space</h3>\s*?.*?\s*?<h3>ending space} $para $cmd]]
  return $result
}
# The items to insert
set items [list First Second Third]

edit_html [lindex $argv 0]

With a defined list named items containing First Second Third , you get this as output:

<head>
<title>Reading from text files</title>
</head>
<body>
<h3>Starting space</h3>
<ul>
    <li>First</li>
    <li>Second</li>
    <li>Third</li>
</ul>
<h3>ending space</h3>
<ul>
</body>
</html>

To edit a text file, you need to load it into memory and then write it out again afterwards; you can't stream with writing back to the same file. Where you can write an easy way to select the text to be replaced directly, you can use a regsub for the core of it, but that's not possible here as you are matching text on either side of the area to match. Thus, for the sort of edit you are looking at, what you need is the index into the string (ie, the content of the file) that indicates where the first character to be replaced is, and the index of the last character to be replaced.

Fortunately, getting the indices is easy. Either you use regexp -indices or you use string first / string last .

# Read the file; standard stanza
set f [open $theFilename]
set data [read $f]
close $f

# Find the markers
regexp -indices {<h3>Starting space</h3>\n<ul>\n} $data start
regexp -indices {\n</ul>\n<h3>ending space</h3>} $data end

# We now need to offset the ends by one in each direction (we want stuff between)
set start [expr {[lindex $start 1] + 1}]
set end [expr {[lindex $end 0] - 1}]

# Now we can generate the replacement...
set replacement ""
foreach item ... {
    append replacement "<li>...</li>\n"
}

# ... and insert it
set data [string replace $data $start $end $replacement]

# ... and write it out (without the extra newline; we've enough already)
set f [open $theFilename "w"]
puts -nonewline $f $data
close $f

Alternatively, you could instead do the replacement as you write things back to the file.

# Read the file; standard stanza
set f [open $theFilename]
set data [read $f]
close $f

# Find the markers
regexp -indices {<h3>Starting space</h3>\n<ul>\n} $data start
regexp -indices {\n</ul>\n<h3>ending space</h3>} $data end

# Generate the replacement text
set replacement ""
foreach item ... {
    append replacement "<li>...</li>\n"
}

# Write everything out
set f [open $theFilename "w"]
puts -nonewline $f [string range $data 0 [lindex $start 1]]
puts -nonewline $f $replacement
puts -nonewline $f [string range $data [lindex $end 0] end]
close $f

You've received many good answers already, I'd just like to point out that parsing HTML with regular expressions can be tricky and error-prone. The tDOM package makes this a breeze, however.

You do need well-formed HTML (it doesn't have to be XHTML-level well-formed, though), so I'll add a starting tag for the html element. I'll also remove the empty li element inside the relevant ul , not because tDOM needs that but because it makes my solution a bit simpler:

<html>
<head>
<title>Reading from text files</title>
</head>
<body>
<h3>Starting space</h3>
<ul>
</ul>
<h3>ending space</h3>
<ul>
</ul>
</body>
</html>

Put this in a variable any way you prefer, eg by reading it from a file:

set f [open foo.html] ; set html [read -nonewline $f] ; close $f

Create a document object and find the root node:

set doc [dom parse -html $html]
set root [$doc documentElement]

Find the node that you want to insert into: it's the first ul element that follows an h3 element that has a text node with the value "Starting space" .

set xpath {//h3[contains(text(), 'Starting space')]/following-sibling::ul[1]}
lassign [$root selectNodes $xpath] node

Insert the items into this node. It's probably best to have a command for that:

proc addItem {doc node txt} {
    set li [$doc createElement li]
    $li appendChild [$doc createTextNode $txt]
    $node appendChild $li
}

Now do it:

addItem $doc $node "First item"

Write back the changed document to the original file or to another file:

set f [open bar.html w] ; $root asHTML -channel $f ; close $f

(Note that the asHTML does not prettify or preserve the formatting of the original HTML.)

Finally clean up data structures and commands created by deleting the document object:

$doc delete

Aside:

If are allowed to change the structure of the original HTML you can make this a bit easier and safer by adding an id attribute to the element you want to insert into. If your HTML has this:

<ul id="insertitemshere">

the xpath becomes something like

set xpath {//ul[@id='insertitemshere']}

The tDOM package is documented here: http://tdom.github.io/ . It's included in the ActiveState Tcl distribution and is documented there as well.

Documentation: lassign , proc , set

Here is my solution:

proc edit_html {release} {
    set f [open $release]
    while {[gets $f line] != -1} {
        if {[string match "*Starting space*" $line]} {
            puts "FANCY LIST"; # Replace with your fancy list

            # Skip to the ending space
            while {![string match "*ending space*" $line]} {
                gets $f line
            }
        } else {
            puts $line
        }
    }
    close $f
}

I am writing the output to the console, but you can choose to write it out to a file.

If you want to add a list of items into a tcl template, a more idiomatic way to do it would be to create the template as a string, and use tcl's substitution mechanism to populate it.

set template {
    <head>
    <title>Reading from text files</title>
    </head>
    <body>
    <h3>Starting space</h3>
    <ul>
        [get_listitems]
    </ul>
    <h3>ending space</h3>
    <ul>
    </body>
    </html>
}

set items {First Second Third}

proc get_listitems {} {
    global items
    set s ""
    foreach i $items {
        append s "<li>$i</li>"
    }
    return $s
}

subst $template

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM