I have two files:
file1.txt :
0000001435 XYZ 与 ABC
0000001438warlaugh 世界
file1.txt :
0000001435 XYZ with abc
0000001436 DFC whatever
0000001437 FBFBBBF
0000001438 world of warlaugh
The lines in the separated file are linked by the number (1st 10 characters). The desired output is a tab separated file with lines that exists and file1.txt
and the corresponding lines from file2.txt
:
file3.txt :
XYZ 与 ABC XYZ with abc
warlaugh 世界 world of warlaugh
How do I get the corresponding lines and then create a tab separated file with lines that exists in file1.txt
to produce file3.txt
?
Note that only the first 10 character constitutes as the ID. , there are cases like 0000001438warlaugh 世界
or even 0000001432231hahaha lol
and only the 0000001438
and 0000001432
is the ID.
I tried with python, getfile3.py :
import io
f1 = {line[:10]:line[10:].strip() for line in io.open('file1.txt', 'r', encoding='utf8')}
f2 = {line[:10]:line[10:].strip() for line in io.open('file1.txt', 'r', encoding='utf8')}
f3 = io.open('file3.txt', 'w', encoding='utf8')
for i in f1:
f3.write(u"{}\t{}\n".format(f1[i], f2[i]))
But is there a bash/awk/grep/perl command-line way that i can get file3.txt
?
awk '
{ key = substr($0,1,10); data = substr($0,11) }
NR==FNR { file1[key] = data; next }
key in file1 { print file1[key] data }
' file1 file2
You could use FIELDWIDTHS with GNU awk rather than substr() if you prefer.
Super long Perl answer:
use warnings;
use strict;
# add files here as needed
my @input_files = qw(file1.txt file2.txt);
my $output_file = 'output.txt';
# don't touch anything below this line
my @output_lines = parse_files(@input_files);
open (my $output_fh, ">", $output_file) or die;
foreach (@output_lines) {
print $output_fh "$_\n"; #print to output file
print "$_\n"; #print to console
}
close $output_fh;
sub parse_files {
my @input_files = @_; #list of text files to read.
my %data; #will store $data{$index} = datum1 datum2 datum3
foreach my $file (@input_files) {
open (my $fh, "<", $file) or die;
while (<$fh>) {
chomp;
if (/^(\d{10})\s?(.*)$/) {
my $index = $1;
my $datum = $2;
if (exists $data{$index}) {
$data{$index} .= "\t$datum";
} else {
$data{$index} = $datum;
} #/else
} #/if regex found
} #/while reading current file
close $fh;
} #/foreach file
# Create output array
my @output_lines;
foreach my $key (sort keys %data) {
push (@output_lines, "$data{$key}");
} #/foreach
return @output_lines;
} #/sub parse_files
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.