[英]Combine columns from multiple TXT files to a table using sed and awk
我有文件FileA.txt
, FileB.txt
, FileC.txt
等,帶有以下列標題:
ID Value1 Value2 Value3
我想在ID列上組合這些文件中的選擇列,將文件名保留為新列標題,因此我得到下表
ID Value1fromFileA Value1fromFileB Value1fromFileC
我可以成功地,盡管不是最佳的,使用ldply()
和cast()
函數在R
執行此操作。 但是,我希望能夠通過一些shell腳本執行此操作。
有什么建議么?
我確信這可以更快/更好地完成,但如果很長時間並且應該工作,則下面很簡單。 只有值得一提的命令是cat file1.txt file2.txt file3.txt | awk '{print $1}' | sort | uniq -c | grep "^[ \\t]*3" | awk '{print $2}'
cat file1.txt file2.txt file3.txt | awk '{print $1}' | sort | uniq -c | grep "^[ \\t]*3" | awk '{print $2}'
cat file1.txt file2.txt file3.txt | awk '{print $1}' | sort | uniq -c | grep "^[ \\t]*3" | awk '{print $2}'
,它將文件組合在一起,取第一列,生成每個值出現次數的計數,並存儲出現3次的值。
#!/bin/bash
trim() {
t="${1##*( )}"
t="${t%%*( )}"
echo "$t"
}
ids=$(cat file1.txt file2.txt file3.txt | awk '{print $1}' | sort | uniq -c | grep "^[ \t]*3" | awk '{print $2}')
for i in $ids; do
line1=''
line2=''
line3=''
for file in file1.txt file2.txt file3.txt; do
while read line; do
index=$(echo $line | awk '{print $1}')
#printf "$index\n"
if [[ $(trim $i) == $(trim $index) ]]; then
if [[ $line1 == '' ]]; then
line1="$line"
elif [[ $line2 == '' ]]; then
line2="$line"
else
line3="$line"
fi
fi
done < "$file"
done
echo "$line1 $line2 $line3" | awk '{print $1 " " $5 " " $9}'
done
例如
$ cat file1.txt
12 F2Value1 F3Value2 F4
35 F2Value1 F3Value2 F42
2 F2Value1 F3Value2 F43
523 F2Value1 F3Value2 F44
123 F2Value1 F3Value2 F45
$ cat file2.txt
1 F2Value1 F3Value2
12 F2Value1 F3Value2
123 F2Value1 F3Value2
523 F2Value1 F3Value2
99 F2Value1 F3Value2
$ cat file3.txt
72 F2Value1 F3Value2
12 F2Value1 F3Value2
100 F2Value1 F3Value2
111 F2Value1 F3Value2
123 F2Value1 F3Value2
$ ./script.sh
12 F2Value1 F3Value2 F4 F2Value1 F3Value2 F2Value1 F3Value2
123 F2Value1 F3Value2 F45 F2Value1 F3Value2 F2Value1 F3Value2
上面使用echo“$ line1 $ line2 $ line3”| awk'{print $ 1“”$ 2“”$ 3“”$ 4“”$ 6“”$ 7“”$ 9“”$ 10“”$ 11}'
awk '{ getline val2<"file2" # read file "file2" to var "val2" , each time, read one line.
split(val2,a2,FS); # split var2 into array a2
getline val3<"file3" # read file "file3" to var "val3" , each time, read one line.
split(val3,b3,FS) # split var3 into array a3
print $1,$2,a2[2],b3[2]
}' file1
你可以試試:
awk '
{
q[$1]++
a[$1,ARGIND]=$2
}
END {
for (i in q) {
if (q[i]==3) {
print i, a[i,1],a[i,2],a[i,3]
}
}
} ' FileA.txt FileB.txt FileC.txt
給定文件: FileA.txt
3 A31 A32 A33
5 A51 A52 A53
9 A91 A92 A93
FileB.txt
2 B21 B22 B23
9 B91 B92 B93
4 B41 B42 B43
5 B51 B52 B53
和FileC.txt
7 C71 C72 C73
9 C91 C92 C93
5 C51 C52 C53
輸出是:
5 A51 B51 C51
9 A91 B91 C91
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.