如何在 Tcl 中使用 unix comm 命令？

Question

我正在嘗試使用 Unix 的 comm 命令來比較 Tcl 中的兩個文件。

我嘗試了以下方法無濟於事：

exec bash -c {comm -2 -3 <(sort file1) <(sort file2) > only_in_file1}
exec {comm -2 -3 <(sort file1) <(sort file2) > only_in_file1}

這是我知道的一種快速方法，但如果 Tcl 中有方法，我想介紹一下。 一般來說，當兩個文件是 10~100K 行的文本行時，我需要比較兩個文件並僅在其中一個文件中找到唯一的行。

Answer 1

由於與現代計算機內存相比，您的文件很小（並且您只是在第一個中尋找不在第二個中的行），因此在純 Tcl 中進行過濾的最簡單方法是將文件保存在內存中.

# Standard read-all-the-lines-of-a-file stanza
proc readLines {filename} {
    set f [open $filename]
    set data [read $f]
    close $f
    return [split $data "\n"]
}

# Read and sort (but the sort is unnecessary for this algorithm)
set lines1 [lsort [readLines file1]]

# Read and load into a dict's keys (i.e., an associative map); sort not needed
set d {}
foreach line [readLines file2] {
    dict set d $line "dummy"
}

# Write out the lines from file1 that aren't in the dictionary
set f [open only_in_file1 "w"]
foreach line $lines1 {
    if {![dict exists $d $line]} {
        puts $f $line
    }
}
close $f

這不完全是comm使用的方法，但它使用的邏輯更難正確，並且需要對兩個輸入進行排序。

Answer 2

awk通常用於這種情況：從 Tcl 調用看起來像

exec awk {NR == FNR {f2[$0]; next} !($0 in f2)} file2 file1 > only_in_file1

這是 Donal 建議的單線外部工具版本。

但是您的 bash 解決方案應該可以工作：

$ cat file1
1        
2
3
4
5

$ cat file2
6
5
4
3

$ tclsh
% exec bash -c {comm -2 -3 <(sort file1) <(sort file2)}
1
2
% exec awk {NR == FNR {f2[$0]; next} !($0 in f2)} file2 file1 
1
2

如何在 Tcl 中使用 unix comm 命令？

問題描述

2 個解決方案

解決方案1
0 2022-07-21 13:28:41

解決方案2
0

如何在 Tcl 中使用 unix comm 命令？

問題描述

2 個解決方案

解決方案1 0 2022-07-21 13:28:41

解決方案2 0

解決方案1
0 2022-07-21 13:28:41

解決方案2
0