简体   繁体   English

Tcl中的模式匹配

[英]Pattern matching in Tcl

I have somefile.txt, contains lines like: 我有somefile.txt,其中包含以下行:

{ abc1 } 1
{ cde1 } 101
{ fgh1 } 1
{ ijk1 } 2 

its a huge file, i wanted to find only 1st and 3rd line and count them. 它的文件很大,我只想找到第一行和第三行并计数。

I have tried with regexp and lsearch(converting it to list) by {\\s\\}\\s1\\n} but its not working. 我用{\\s\\}\\s1\\n}尝试了regexp和lsearch(将其转换为列表),但是没有用。 What should I do...? 我该怎么办...?

I have also tried {\\s\\}\\s1} but it prints all 4 lines. 我也尝试过{\\s\\}\\s1}但它会打印所有4行。

You seem to need to capture the digits at the end of the first and third lines. 您似乎需要捕获第一行和第三行末尾的数字。

Here is a way to achieve that: 这是一种实现该目标的方法:

set s {{ abc1 } 1
{ cde1 } 101
{ fgh1 } 1
{ ijk1 } 2}
set re {^{[^{}]*}\s*(\d+)\s+{[^{}]*}\s*\d+\s+{[^{}]*}\s*(\d+)}
regexp $re $s m g1 g2
set res [expr $g1 + $g2]
puts $res

See the IDEONE demo IDEONE演示

The pattern matches: 模式匹配:

  • ^ - start of a string ^ -字符串的开头
  • {[^{}]*} - a {...} -like string with no braces inside {[^{}]*} -一个类似于{...}的字符串,内部没有大括号
  • \\s* - 0+ whitespaces \\s* -0+空格
  • (\\d+) - Group 1 ( g1 ) capturing 1+ digits (\\d+) -组1( g1 )捕获1个以上的数字
  • \\s+ - 1+ whitespaces (can be replaced with [\\r\\n]+ if there can be no trailing/leading whitespace before and after) \\s+ -1+个空格(如果前后没有尾随/前导空格,可以用[\\r\\n]+代替)
  • {[^{}]*}\\s*\\d+\\s+{[^{}]*}\\s*(\\d+) - see above, just (\\d+) will create a second variable, g2 . {[^{}]*}\\s*\\d+\\s+{[^{}]*}\\s*(\\d+) -参见上文,仅(\\d+)将创建第二个变量g2

See the regex demo 正则表达式演示

A problem like this gets around an order or magnitude easier to solve if you don't use regular expressions. 如果不使用正则表达式,这样的问题就更容易解决一个数量级或数量级。

package require fileutil

::fileutil::foreachLine line somefile.txt {
    if {[lindex $line end] == 1} {
        puts $line
    }
}

This solution looks at each line in the file and checks if the last item is equal to 1. If so, the line is printed. 此解决方案查看文件中的每一行,并检查最后一项是否等于1。如果是,则打印该行。

You could also count them / sum them: 您也可以计算/总结一下:

set count 0
set sum 0
::fileutil::foreachLine line somefile.txt {
    if {[lindex $line end] == 1} {
        puts $line
        incr count
        incr sum [lindex $line end] ;# yeah, I know, always 1
    }
}
puts "Number of lines: $count"
puts "Sum of items: $sum"

If fileutil isn't available in your Tcl installation and you can't or don't want to install it, you can use the lower-level core equivalent: 如果您的Tcl安装中没有fileutil ,并且您无法安装或不想安装它,则可以使用等效的低层内核:

set f [open somefile.txt]
while {[gets $f line] >= 0} {
    if {[lindex $line end] == 1} {
        puts $line
    }
}
close $f

If you absolutely must use a regular expression, in this case you could do this: 如果绝对必须使用正则表达式,在这种情况下,您可以这样做:

::fileutil::foreachLine line somefile.txt {
    if {[regexp {\m1$} $line]} {
        puts $line
    }
}

This regular expression finds lines that end with the digit 1 in a word by itself (ie there are no digits or word characters preceding it). 该正则表达式本身会找到单词中以数字1结尾的行(即,前面没有数字或单词字符)。

Documentation: close , fileutil package, gets , if , lindex , open , package , puts , Syntax of Tcl regular expressions , regexp , while 文档: 关闭fileutil包, 获取如果LINDEX开放Tcl的语法正则表达式正则表达式

Solution 1: If you dont want to use regexp and your inputs line have same format like {string} number 解决方案1:如果您不想使用regexp并且您的输入行的格式与{string} number

set fd [open "somefile.txt" r]
while {[gets $fd line] >= 0} {
    if {[lindex $line 1] == 1} {
        puts [lindex $line 1] ;# Prints only 1
        puts $line            ;# Prints Whole Line which has 1 at end
    }
}

Solution 2: If you want to use regexp , then go for group-capturing which is (.*) 解决方案2:如果要使用regexp ,请进行group-capturing ,即(.*)

set fd [open "somefile.txt" r]
while {[gets $fd line] >= 0} {
    if {[regexp "\{.*\} (.*)" $line match match1]} {
        if {$match1 == 1} {
            puts $line
        }
    }
}

Solution 3: Based on @Peter suggestion on regexp 解决方案3:基于正则regexp上的@Peter建议

set fd [open "somefile.txt" r]
while {[gets $fd line] >= 0} {
    if {[regexp {\d+$} $line match]} {
        if {$match == 1} {
            puts $match ;# Prints only 1
            puts $line  ;# Prints whole line which has 1 at end 
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM