简体   繁体   中英

How to delete 1st line in text file using tcl/tk

I can do this but my method is clumsy as requires to open the file, read the file, split into all the lines, recombine but without the 1st line, finally save the file.

Since these text files can be large datafiles, like to avoid (shorten) all these processing steps. Perhaps someone knows a shorter or slicker way to do this?

Many sincere thanks!

If your files are very very very large, you actually have to work with them line by line, but if they're just a GB or so, you can simplify the processing by working with the contents as a single chunk of data.

package require fileutil

fileutil::updateInPlace file.ext {apply {data {
    regsub {.*?\n} $data {}
}}}

The updateInPlace command takes a file name and a command prefix. It opens the file, reads the contents, and invokes the command prefix with the contents as argument: finally it replaces the file contents with the result of the invocation. In this case, the command prefix is the apply command and an anonymous function ( lambda ) that does the work.

Another, mostly equivalent, way to write the same thing is with a named command procedure:

proc cmd data {
    regsub {.*?\n} $data {}
}

fileutil::updateInPlace file.ext cmd

The body of the command / lambda can be anything that removes all text up to the first newline character in the text, eg

    regsub {[^\n]*\n} $data {}

same as above (replace the matching text up to the first newline), but with a greedy match

    string range $data [string first \n $data]+1 end

find the index of the first newline and take everything that follows

    join [lrange [split [string trimright $data] \n] 1 end] \n

get a list of lines and build a new text consisting of all lines except the first.

The different variants aren't exactly the same. If there are no newlines in the file, the regsub and string range variants make no changes, but the lrange variant sets the content to the empty string.

Documentation: apply , fileutil package, join , lrange , package , proc , Syntax of Tcl regular expressions , regsub , split , string

For a very large file (on a modern machine, it's got to be at least 500MB to be into this category), you've not really got that much you can do to shorten things since you're dealing with moving a lot of data. You have to move the data to remove the first line. (You can remove lines from the end by truncating.)

But you can do some tricks that will speed things up. In particular, moving the data around as binary in megabyte-sized chunks is quite a lot quicker. This makes plentiful use of seek and tell , and finishes with chan truncate .

# Open in read-write mode
set f [open $filename r+]
# Read in the stuff we want to delete; reading is easiest way to find end of line
gets $f

##### HOW TO COPY REMAINDER OF FILE TO EARLIER IN FILE #####

set target 0; # Start of file
fconfigure $f -translation binary
set source [tell $f]
while true {
    # Read a megabyte (1024*1024 bytes) from the source position in the file
    seek $f $source
    set data [read $f 1048576]
    set source [tell $f]; # Remember for next iteration
    # If we didn't read anything, we're done.
    if {[string length $data] == 0} {
        break
    }
    # Write the data to the target location in the file. May overlap with where we
    # read from, but won't go past end. (IMPORTANT!)
    seek $f $target
    puts -nonewline $f $data
    set target [tell $f]; # Remember for next iteration
}
# Ensure there's nothing left over at the end
chan truncate $f $target
close $f

As you can see, just reading everything into memory, manipulating that, and then writing it out again is actually simpler to code and simpler to code so that a failure won't destroy the file . (You can also do stream processing a line at a time, writing out to a new temporary file, which scales up very large, but it requires having the extra disk space in the first place.) Remember, the only truly easy thing to do to a large file is to append to it.


Very large datasets are best put in a database if possible. That's a much more extensive change to your code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM