[英]Complex Data Manipulation in Python
I have 3 files with real data and pseudo data and value of real data. 我有3个文件,包含真实数据和伪数据以及真实数据的值。
File_one
has two columns with one column as real data and the second column as the translational data. File_one
有两列,其中一列为实际数据,第二列为转换数据。 Ie For the real data a pseudo value is given. 即,对于真实数据,给出伪值。
col[0] col[1]
123 0
234 1
345 2
456 3
567 4
678 5
File_two
has pairs of pseudo values ie In place of 123
the value used is 0
and the same way the pseudo value pairs as [0, 1]
which means [123, 234]
in real. File_two
具有成对的伪值,即代替123
所使用的值是0
并且伪值对与[0, 1]
,这实际上意味着[123, 234]
。
col[0] col[1]
0 2
0 3
0 5
2 4
5 1
So can say that col[0] and col[1]
of file_two
are the key and the value is in file_one
col[0]
因此可以说, file_two
col[0] and col[1]
是键,并且值在file_one
col[0]
Now I have to match the pseudo value pairs from file_two
with the real data col[0]
in file_one
and get an output saving it to new file. 现在,我必须将file_two
中的伪值对与file_two
中的实际数据col[0]
file_one
并获得将其保存到新文件的输出。 We name it file_four
. 我们将其命名为file_four
。 Here pairs occur only ONE
time. 这对只发生ONE
时间。
col[0] col[1]
123 345
123 456
123 678
345 567
678 234
Now file_three
comes into the picture. 现在, file_three
进入了图片。 File_three
has the 3 columns. File_three
具有3列。
col[0]
and col[1]
are the same pairs as in file_four
but they also have many other pairs that are not present in file_four
. col[0]
和col[1]
是相同的对在file_four
但他们也有许多其它的对不存在于file_four
。
File_three File_three
col[0] col[1] col[2]
123 345 54
345 262 65
123 456 54
2456 2467 98
123 678 46
7845 2458 631
345 567 153
3456 3673 94
678 234 5
Finally, I need to match the pairs of file_four
ie col[0] col[1]
and pull the value from col[2]
in file_three
and generate a new output_file
with the pairs of file_four
as key and the value in col[2]
of file_three
. 最后,我需要匹配成对的file_four
即col[0] col[1]
并从file_three
col[2]
中file_three
值,并生成一个新的output_file
其中以成对的file_four
作为键,而col[2]
的值的file_three
。
In the following code I am trying to only consider first two files 在下面的代码中,我试图仅考虑前两个文件
from collections import defaultdict
d1 = dict()
d2 = dict()
with open('input1.txt', 'r') as file1:
for row in file1:
c0, c1 = row.split()[:2]
d1[c1] = c0
with open('input2.txt', 'r') as file2:
for row in file2:
c0, c1 = row.split()[:2]
d2[(c0, c1)] = [d1[c1], d1[c1]]
#for k, v in sorted(d2.items()):
#print '\t'.join(v)
print d2
Error:
Key Error: 'key'
Its the same error even if for loop is not commented and last print is commented. 即使未注释for循环且注释了最后一次打印,也存在相同的错误。
You don't have matching keys because d1
contains pairs as keys, while d2
contains single values. 您没有匹配的键,因为d1
包含对作为键,而d2
包含单个值。
This line looks like it is wrong: 这行看起来是错误的:
key = col[0], col[1]
For d1
, use file1 column 1 for the keys and column 0 for the values creating a lookup table: 对于d1
,将file1的第1列用作键,将第0列用作创建查找表的值:
f1 = [(123,0),(234,1),(345,2),(456,3),(567,4),(678,5)]
f2 = [(0,2),(0,3),(0,5),(2,4),(5,1)]
d1 = {c1:c0 for c0,c1 in f1}
That allows you to use file2 column values to look up the values in d1
这使您可以使用file2列值来查找 d1
的值
d2 = {(c0, c1):[d1[c0], d1[c1]] for c0, c1 in f2}
print d2
>>>
{(5, 1): [678, 234], (0, 3): [123, 456], (0, 5): [123, 678], (0, 2): [123, 345], (2, 4): [345, 567]}
>>>
Your code for file 1 and file 2 refactored : 您的文件1和文件2的代码已重构 :
d1, d2 = dict(), dict()
with open('inputfile1.txt', 'r') as file1:
for row in file1:
c0, c1 = row.strip().split()[:2]
d1[c1] = c0
with open('inputfile2.txt', 'r') as file2:
for row in file2:
c0, c1 = row.strip().split()[:2]
d2[(c0, c1)] = [d1[c0], d1[c1]]
>>> for k, v in sorted(d2.items()):
print '\t'.join(v)
123 345
123 456
123 678
345 567
678 234
>>>
Unpacking values/items during assignment: 在分配过程中解包值/项目:
>>>
>>> x, y, z = [1, 2, 3]
>>> print x, y, z
1 2 3
>>> x, y = [1, 2, 3]
Traceback (most recent call last):
File "<pyshell#259>", line 1, in <module>
x, y = [1, 2, 3]
ValueError: too many values to unpack
>>>
>>> a, b, _, _, _, _ = '1 2 3 4 5 6'.split()
>>> print a, b, _
1 2 6
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.