[英]Compare two files if they both match first column then replace the values of column 2 and 3 (Python)
There are two files.有两个文件。 If the ID number matches both files, then I want only the value 1 and value 2 from File2.txt, Please let me know if my question is unclear
如果 ID 号与两个文件都匹配,那么我只想要 File2.txt 中的值 1 和值 2,如果我的问题不清楚,请告诉我
File1.txt
ID Number Value 1 Value 2 Country
0001 23 55 Spain
0231 15 23 USA
4213 10 11 Canada
7541 32 29 Italy
File2.txt
0001 5 6
0231 7 18
4213 54 87
5554 12 10
1111 31 13
6422 66 51
The output should look like this. output 应如下所示。
ID Number Value 1 Value 2 Country
0001 5 6 Spain
0231 7 18 USA
4213 54 87 Canada
7541 32 29 Italy
New example:新例子:
File3.txt
#ID CAT CHN LC SC LATITUDE LONGITUDE
20022 CX 21 -- 4 32.739000 -114.635700
01711 CX 21 -- 3 32.779700 -115.567500
08433 CX 21 -- 2 31.919930 -123.321000
File4.txt
20022,32.45,-114.88
01192,32.839,-115.487
01711,32.88,-115.45
01218,32.717,-115.637
output
#ID CAT CHN LC SC LATITUDE LONGITUDE
20022 CX 21 -- 4 32.45 -114.88
01711 CX 21 -- 3 32.88 -115.45
08433 CX 21 -- 2 31.919930 -123.321000
Code I got so far到目前为止我得到的代码
f = open("File3.txt", "r")
x= open("File4.txt","r")
df1 = pd.read_csv(f, sep=' ', engine='python')
df2 = pd.read_csv(x, sep=' ', header=None, engine='python')
df2 = df2.set_index(0).rename_axis("#ID")
df2 = df2.rename(columns={5:'LATITUDE', 6: 'LONGITUDE'})
df1 = df1.set_index('#ID')
df1.update(df2)
print(df1)
Something like this, possibly:像这样的事情,可能是:
file1_data = []
file1_headers = []
with open("File1.txt") as file1:
for line in file1:
file1_data.append(line.strip().split("\t"))
file1_headers = file1_data[0]
del file1_data[0]
file2_data = []
with open("File2.txt") as file2:
for line in file2:
file2_data.append(line.strip().split("\t"))
file2_ids = [x[0] for x in file2_data]
final_data = [file1_headers] + file1_data
for i in range(1, len(final_data)):
if final_data[i][0] in file2_ids:
match = [x for x in file2_data if x[0] == final_data[i][0]]
final_data[i] = [match[0] + [final_data[i][3]]]
with open("output.txt", "w") as output:
output.writelines(["\t".join(x) for x in final_data])
final_data
becomes an alias of file1_data
and then is selectively replacing rows with matching id's in file2_data
, but keeping the country. final_data
成为file1_data
的别名,然后选择性地替换file2_data
中具有匹配 id 的行,但保留国家/地区。
Okay, what you need to do here is to get the indexes to match in both dataframes after importing.好的,您在这里需要做的是在导入后让索引在两个数据框中匹配。 This is important because pandas use data alignment based on indexes.
这很重要,因为 pandas 使用基于索引的数据 alignment。
Here is a complete example using your data:这是使用您的数据的完整示例:
from io import StringIO
import pandas as pd
File1txt=StringIO("""ID Number Value 1 Value 2 Country
0001 23 55 Spain
0231 15 23 USA
4213 10 11 Canada
7541 32 29 Italy""")
File2txt = StringIO("""0001 5 6
0231 7 18
4213 54 87
5554 12 10
1111 31 13
6422 66 51""")
df1 = pd.read_csv(File1txt, sep='\s\s+', engine='python')
df2 = pd.read_csv(File2txt, sep='\s\s+', header=None, engine='python')
print(df1)
# ID Number Value 1 Value 2 Country
# 0 1 23 55 Spain
# 1 231 15 23 USA
# 2 4213 10 11 Canada
# 3 7541 32 29 Italy
print(df2)
# 0 1 2
# 0 1 5 6
# 1 231 7 18
# 2 4213 54 87
# 3 5554 12 10
# 4 1111 31 13
# 5 6422 66 51
df2 = df2.set_index(0).rename_axis('ID Number')
df2 = df2.rename(columns={1:'Value 1', 2: 'Value 2'})
df1 = df1.set_index('ID Number')
df1.update(df2)
print(df1.reset_index())
Output: Output:
ID Number Value 1 Value 2 Country
0 1 5.0 6.0 Spain
1 231 7.0 18.0 USA
2 4213 54.0 87.0 Canada
3 7541 32.0 29.0 Italy
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.