TCL-REGEX ::如何使用TCL regexp过滤文本文件中多次出现的行

Question

输入文件（resultnew.txt）：

www.maannews.net.

www.maannews.net.

 ################################################# 

attach2.mobile01.com.

www.google-analytics.

attach2.mobile01.com.

attach2.mobile01.com.

www.google-analytics.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

www.google.com.

attach2.mobile01.com.

attach2.mobile01.com.

attach2.mobile01.com.

 ################################################# 

cdn-img.mocospace.com

cdn-img.mocospace.com

www.mocospace.com.

cdn-img.mocospace.com

cdn-img.mocospace.com

cdn-img.mocospace.com

www.mocospace.com.

cdn-img.mocospace.com

www.mocospace.com.

www.google-analytics.

www.google-analytics.

fonts.gstatic.com.

cdn-img.mocospace.com

cdn-img.mocospace.com

fonts.gstatic.com.

fonts.gstatic.com.

 #################################################

我的TCL脚本：

set a [open resultnew.txt r]
set b [open balu_output.txt w]


while {[gets $a a1] >=0} {
    if {[regexp {[a-zA-Z\.]} $a1]} {
    puts $b $a1
    }
}

我的要求：

从上面的文本文件中，我想删除多次出现的行，并且只想打印一次到一个新文件中。
点1应该出现在每个“ #################”和“ ################”之间。 仍然“ ##################应该出现在该文本文件中”。

请帮我提出您的想法。 提前致谢。

谢谢，

Balu P.

Answer 1

您需要一种不同的方法来检查是否忽略行，而数组非常适合进行唯一性检查。 这是带注释的版本：

# For each line in the input
while {[gets $a a1] >= 0} {
    # Get rid of extra spaces
    set a1 [string trim $a1]
    # Ignore empty and comment lines; [string match] is great for this!
    if {$a1 eq "" || [string match "#*" $a1]} {
        continue
    }
    # See if this is the first time we've seen a line
    if {[incr occurrences($a1)] == 1} {
        # It is! Print it now
        puts $b $a1
    }
}

如果您有一个非常大的文件，则最终可能会遇到内存使用问题。 但是对于（最多）几百万行的文件，您应该没问题。

Answer 2

从您的问题中我了解到的是，您需要在注释行之间使用不同的值（即hashess ......）。 以下是您正在寻找的解决方案...基本上，在脚本数组键中，键用于保持唯一值并在下一个分隔线（即，看到您的哈希注释行）时重新初始化数组...

我在STDOUT上打印了值，您可以将它们重定向到其他文件。

#!/usr/bin/tclsh
set a [open resultnew.txt r]

# set an array to keep the unique records
array set myarray {}

# For each line in the input
while {[gets $a a1] >= 0} {

    # Get rid of extra spaces
    set a1 [string trim $a1]

    # if divider line found then print it (i.e ####)
    if { [string match "#*" $a1] } {
      puts $a1
      # unset the array for next set of entries
      array unset myarray
    } else {
      # Ignore empty lines
      if {$a1 ne "" }  {
        # print only if doesnot exists in the array
        if { [info exists myarray($a1) ] } {
          set myarray($a1) 1 
        } else {
          puts $a1
          set myarray($a1) 1
        }
     }
  }
}

使用输入文件输出脚本

$tclsh main.tcl
www.maannews.net.
#################################################
attach2.mobile01.com.
www.google-analytics.
www.google.com.
#################################################
cdn-img.mocospace.com
www.mocospace.com.
www.google-analytics.
fonts.gstatic.com.
#################################################

TCL-REGEX ::如何使用TCL regexp过滤文本文件中多次出现的行

问题描述

2 个解决方案

解决方案1
1 2014-08-05 08:29:47

解决方案2
1 已采纳 2014-08-05 10:14:24

TCL-REGEX ::如何使用TCL regexp过滤文本文件中多次出现的行

问题描述

2 个解决方案

解决方案1 1 2014-08-05 08:29:47

解决方案2 1 已采纳 2014-08-05 10:14:24

解决方案1
1 2014-08-05 08:29:47

解决方案2
1 已采纳 2014-08-05 10:14:24