[英]extracting data from a dictionary
I have two tab delimited files, file 1 contains identifiers and file 2 has values related to these identifiers (or say it is a very big dictionary). 我有两个制表符分隔的文件,文件1包含标识符,文件2具有与这些标识符相关的值(或者说这是一个很大的字典)。
file 1 文件1
Ronny Rubby Suzie Paul
file 1 has only one column. 文件1只有一列。
file 2 文件2
Alistar Barm Cathy Paul Ronny Rubby Suzie Tom Uma Vai Zai 12 13 14 12 11 11 12 23 30 0.34 0.65 1 4 56 23 12 8.9 5.1 1 4 25 3
n number of rows are present in file 2. 文件2中存在n行。
what I want, if the identifiers of file 1 are present in file 2, I should have all the values related to it in an another tab delimited file. 我想要的是,如果文件1的标识符存在于文件2中,则我应该在另一个制表符分隔的文件中拥有与之相关的所有值。
Something like this: 像这样:
Paul Ronny Rubby Suzie 12 11 11 12 23 12 8.9 5.1
Thank you in advance. 先感谢您。
NOTE 注意
your example output is NOT correct, since there you have "Ruby" but in your file1 example you had "Rubby" Ruby =/= Rubby 您的示例输出不正确,因为那里有“ Ruby”,但在file1示例中却有“ Rubby” Ruby = / = Rubby
kent$ awk 'NR==FNR{t[$0]++;next}
{if(FNR==1){
for(i=1;i<=NF;i++)
if($i in t){
v[i]++;
printf $i"\t";
}
print "";
}else{
for(x in v)
printf $x"\t"
print "";
}
}' file1 file2
output 输出
Paul Ronny Suzie
12 11 12
23 12 5.1
$ awk 'FILENAME~1{a[$0];next};FNR==1{for(i=1;i<=NF;i++)if($i in a)b[i]};{for(j in b)printf("%s\t",$j);print ""}' file{1,2}.txt
Paul Ronny Suzie
12 11 12
23 12 5.1
break into multi lines && add whitespace 分成多行&&添加空格
$ awk '
> FILENAME~1 { a[$0]; next }
> FNR==1 { for(i=1; i<=NF; i++) if($i in a) b[i] }
> { for(j in b) printf("%s\t",$j); print ""}
> ' file{1,2}.txt
Paul Ronny Suzie
12 11 12
23 12 5.1
You can use only bash to do it: 您只能使用bash来做到这一点:
FIELDS=`head -1 f2.txt | tr "\t" "\n" | nl -ba | grep -f f1.txt | cut -f1 | tr -d " " | tr "\n" ","`; FIELDS=${FIELDS/%,/}
cut -f$FIELDS f2.txt
Paul Ronny Ruby Suzie
12 11 11 12
23 12 8.9 5.1
An example in Python that does the work in stream (ie: don't need to load the full file before starting the output): 在Python中进行流式处理的示例(即:在开始输出之前不需要加载完整文件):
# read keys
with open('file1', 'r') as fd:
keys = fd.read().splitlines()
# output keys
print '\t'.join(keys)
# read data file, with header line and content
with open('file2', 'r') as fd:
headers = fd.readline().split()
while True:
line = fd.readline().split()
if len(line) == 0:
break
print '\t'.join([line[headers.index(x)] for x in keys if x in headers])
Output: 输出:
$ python test.py
Ronny Ruby Suzie Paul
11 11 12 12
12 8.9 5.1 23
Perl solution: Perl解决方案:
#!/usr/bin/perl
use warnings;
use strict;
open my $KEYS, '<', 'file1' or die $!;
my @keys = <$KEYS>;
close $KEYS;
chomp @keys;
my %is_key;
undef @is_key{@keys};
open my $TAB, '<', 'file2' or die $!;
$_ = <$TAB>;
my ($i, @columns);
for (split) {
push @columns, $i if exists $is_key{$_};
$i++;
}
do {{
my @values = split;
print join("\t", @values[@columns]), "\n";
}} while <$TAB>;
Something like this could probably work, depending on what you want. 根据您的需求,类似的事情可能会起作用。
use strict;
use warnings;
my %names;
open ( my $nh, '<', $name_file_path ) or die "Could not open '$name_file_path'!";
while ( <$nh> ) {
m/^\s*(.*?\S)\s*$/ and $names{ $1 } = 1;
}
close $nh;
my $coln = -1;
open ( my $dh, '<', $data_file_path ) or die "Could not open '$data_file_path'!";
my ( @name_list, @col_list )
my @names = split /\t/, <$dh>;
foreach my $name ( 0..$#names ) {
next unless exists $names{ $names[ $name ] };
push @name_list, $name;
push @col_list, $coln;
}
local $" = "\t";
print "@name_list\n";
print "@{[ split /\t/ ]}[ @col_list ]\n" while <$dh>;
close $dh;
This might work for you: 这可能对您有用:
sed '1{s/\t/\n/gp};d' file2 |
nl |
grep -f file1 |
cut -f1 |
paste -sd, |
sed 's/ //g;s,.*,cut -f& /tmp/b,' |
sh
Explanation: 说明:
,
's. 旋转以,
分隔的列号。 cut
command from the comma separated column number list. 从逗号分隔的列号列表中生成一个cut
命令。 cut
command against the data file. 对数据文件运行cut
命令。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.