比较两个文件，如果它们都匹配第一列，然后替换第 2 列和第 3 列的值（Python）

Question

There are two files.有两个文件。 If the ID number matches both files, then I want only the value 1 and value 2 from File2.txt, Please let me know if my question is unclear如果 ID 号与两个文件都匹配，那么我只想要 File2.txt 中的值 1 和值 2，如果我的问题不清楚，请告诉我

File1.txt


ID Number   Value 1     Value 2     Country 
0001        23            55        Spain
0231        15            23        USA     
4213        10            11        Canada
7541        32            29        Italy

File2.txt

0001        5       6
0231        7       18
4213        54      87
5554        12      10
1111        31      13
6422        66      51

The output should look like this. output 应如下所示。

ID Number   Value 1     Value 2     Country 
0001          5           6         Spain
0231          7          18         USA     
4213          54         87         Canada
7541          32         29         Italy

New example:新例子：

File3.txt

#ID CAT CHN LC SC LATITUDE LONGITUDE 
20022 CX 21 --   4  32.739000  -114.635700 
01711 CX 21 --   3  32.779700  -115.567500
08433 CX 21 --   2  31.919930  -123.321000


File4.txt

20022,32.45,-114.88
01192,32.839,-115.487
01711,32.88,-115.45
01218,32.717,-115.637

output
#ID CAT CHN LC SC LATITUDE LONGITUDE 
20022 CX 21 --   4  32.45  -114.88 
01711 CX 21 --   3  32.88  -115.45
08433 CX 21 --   2  31.919930  -123.321000

Code I got so far到目前为止我得到的代码

f = open("File3.txt", "r") 
x= open("File4.txt","r")

df1 = pd.read_csv(f, sep=' ', engine='python')
df2 = pd.read_csv(x, sep=' ', header=None, engine='python')

df2 = df2.set_index(0).rename_axis("#ID")
df2 = df2.rename(columns={5:'LATITUDE', 6: 'LONGITUDE'})
df1 = df1.set_index('#ID')
df1.update(df2)
print(df1)

Answer 1

Something like this, possibly:像这样的事情，可能是：

file1_data = []
file1_headers = []

with open("File1.txt") as file1:
    for line in file1:
        file1_data.append(line.strip().split("\t"))
    file1_headers = file1_data[0]
    del file1_data[0]

file2_data = []

with open("File2.txt") as file2:
    for line in file2:
        file2_data.append(line.strip().split("\t"))

file2_ids = [x[0] for x in file2_data]

final_data = [file1_headers] + file1_data

for i in range(1, len(final_data)):
    if final_data[i][0] in file2_ids:
        match = [x for x in file2_data if x[0] == final_data[i][0]]
        final_data[i] = [match[0] + [final_data[i][3]]]

with open("output.txt", "w") as output:
    output.writelines(["\t".join(x) for x in final_data])

final_data becomes an alias of file1_data and then is selectively replacing rows with matching id's in file2_data , but keeping the country. final_data成为file1_data的别名，然后选择性地替换file2_data中具有匹配 id 的行，但保留国家/地区。

Answer 2

Okay, what you need to do here is to get the indexes to match in both dataframes after importing.好的，您在这里需要做的是在导入后让索引在两个数据框中匹配。 This is important because pandas use data alignment based on indexes.这很重要，因为 pandas 使用基于索引的数据 alignment。

Here is a complete example using your data:这是使用您的数据的完整示例：

from io import StringIO
import pandas as pd

File1txt=StringIO("""ID Number   Value 1     Value 2     Country 
0001        23            55        Spain
0231        15            23        USA     
4213        10            11        Canada
7541        32            29        Italy""")


File2txt = StringIO("""0001        5       6
0231        7       18
4213        54      87
5554        12      10
1111        31      13
6422        66      51""")

df1 = pd.read_csv(File1txt, sep='\s\s+', engine='python')
df2 = pd.read_csv(File2txt, sep='\s\s+', header=None, engine='python')

print(df1)
#    ID Number  Value 1  Value 2 Country
# 0          1       23       55   Spain
# 1        231       15       23     USA
# 2       4213       10       11  Canada
# 3       7541       32       29   Italy

print(df2)
#       0   1   2
# 0     1   5   6
# 1   231   7  18
# 2  4213  54  87
# 3  5554  12  10
# 4  1111  31  13
# 5  6422  66  51

df2 = df2.set_index(0).rename_axis('ID Number')
df2 = df2.rename(columns={1:'Value 1', 2: 'Value 2'})
df1 = df1.set_index('ID Number')
df1.update(df2)
print(df1.reset_index())

Output: Output：

   ID Number  Value 1  Value 2 Country
0          1      5.0      6.0   Spain
1        231      7.0     18.0     USA
2       4213     54.0     87.0  Canada
3       7541     32.0     29.0   Italy

比较两个文件，如果它们都匹配第一列，然后替换第 2 列和第 3 列的值（Python）

问题描述

2 个解决方案

解决方案1
1 2020-05-28 20:14:15

解决方案2
0 2020-05-28 20:17:38

比较两个文件，如果它们都匹配第一列，然后替换第 2 列和第 3 列的值（Python）

问题描述

2 个解决方案

解决方案1 1 2020-05-28 20:14:15

解决方案2 0 2020-05-28 20:17:38

解决方案1
1 2020-05-28 20:14:15

解决方案2
0 2020-05-28 20:17:38