简体   繁体   English

比较两个文件,如果它们都匹配第一列,然后替换第 2 列和第 3 列的值(Python)

[英]Compare two files if they both match first column then replace the values of column 2 and 3 (Python)

There are two files.有两个文件。 If the ID number matches both files, then I want only the value 1 and value 2 from File2.txt, Please let me know if my question is unclear如果 ID 号与两个文件都匹配,那么我只想要 File2.txt 中的值 1 和值 2,如果我的问题不清楚,请告诉我

File1.txt


ID Number   Value 1     Value 2     Country 
0001        23            55        Spain
0231        15            23        USA     
4213        10            11        Canada
7541        32            29        Italy

File2.txt

0001        5       6
0231        7       18
4213        54      87
5554        12      10
1111        31      13
6422        66      51

The output should look like this. output 应如下所示。

ID Number   Value 1     Value 2     Country 
0001          5           6         Spain
0231          7          18         USA     
4213          54         87         Canada
7541          32         29         Italy

New example:新例子:

File3.txt

#ID CAT CHN LC SC LATITUDE LONGITUDE 
20022 CX 21 --   4  32.739000  -114.635700 
01711 CX 21 --   3  32.779700  -115.567500
08433 CX 21 --   2  31.919930  -123.321000


File4.txt

20022,32.45,-114.88
01192,32.839,-115.487
01711,32.88,-115.45
01218,32.717,-115.637

output
#ID CAT CHN LC SC LATITUDE LONGITUDE 
20022 CX 21 --   4  32.45  -114.88 
01711 CX 21 --   3  32.88  -115.45
08433 CX 21 --   2  31.919930  -123.321000

Code I got so far到目前为止我得到的代码

f = open("File3.txt", "r") 
x= open("File4.txt","r")

df1 = pd.read_csv(f, sep=' ', engine='python')
df2 = pd.read_csv(x, sep=' ', header=None, engine='python')

df2 = df2.set_index(0).rename_axis("#ID")
df2 = df2.rename(columns={5:'LATITUDE', 6: 'LONGITUDE'})
df1 = df1.set_index('#ID')
df1.update(df2)
print(df1)

Something like this, possibly:像这样的事情,可能是:

file1_data = []
file1_headers = []

with open("File1.txt") as file1:
    for line in file1:
        file1_data.append(line.strip().split("\t"))
    file1_headers = file1_data[0]
    del file1_data[0]

file2_data = []

with open("File2.txt") as file2:
    for line in file2:
        file2_data.append(line.strip().split("\t"))

file2_ids = [x[0] for x in file2_data]

final_data = [file1_headers] + file1_data

for i in range(1, len(final_data)):
    if final_data[i][0] in file2_ids:
        match = [x for x in file2_data if x[0] == final_data[i][0]]
        final_data[i] = [match[0] + [final_data[i][3]]]

with open("output.txt", "w") as output:
    output.writelines(["\t".join(x) for x in final_data])

final_data becomes an alias of file1_data and then is selectively replacing rows with matching id's in file2_data , but keeping the country. final_data成为file1_data的别名,然后选择性地替换file2_data中具有匹配 id 的行,但保留国家/地区。

Okay, what you need to do here is to get the indexes to match in both dataframes after importing.好的,您在这里需要做的是在导入后让索引在两个数据框中匹配。 This is important because pandas use data alignment based on indexes.这很重要,因为 pandas 使用基于索引的数据 alignment。

Here is a complete example using your data:这是使用您的数据的完整示例:

from io import StringIO
import pandas as pd

File1txt=StringIO("""ID Number   Value 1     Value 2     Country 
0001        23            55        Spain
0231        15            23        USA     
4213        10            11        Canada
7541        32            29        Italy""")


File2txt = StringIO("""0001        5       6
0231        7       18
4213        54      87
5554        12      10
1111        31      13
6422        66      51""")

df1 = pd.read_csv(File1txt, sep='\s\s+', engine='python')
df2 = pd.read_csv(File2txt, sep='\s\s+', header=None, engine='python')

print(df1)
#    ID Number  Value 1  Value 2 Country
# 0          1       23       55   Spain
# 1        231       15       23     USA
# 2       4213       10       11  Canada
# 3       7541       32       29   Italy

print(df2)
#       0   1   2
# 0     1   5   6
# 1   231   7  18
# 2  4213  54  87
# 3  5554  12  10
# 4  1111  31  13
# 5  6422  66  51

df2 = df2.set_index(0).rename_axis('ID Number')
df2 = df2.rename(columns={1:'Value 1', 2: 'Value 2'})
df1 = df1.set_index('ID Number')
df1.update(df2)
print(df1.reset_index())

Output: Output:

   ID Number  Value 1  Value 2 Country
0          1      5.0      6.0   Spain
1        231      7.0     18.0     USA
2       4213     54.0     87.0  Canada
3       7541     32.0     29.0   Italy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中,如何根据一列中的值比较两个csv文件并从第一个文件中输出与第二个不匹配的记录 - In Python, how to compare two csv files based on values in one column and output records from first file that do not match second 比较两个文本文件,然后根据匹配的第一列更新特定值。 (Python) - compare two text files then update the specific values based on the matching first column. (python) Python:比较两个文件中的列 - Python: compare column in two files 比较两个列的值并从第一列中获得不同的值 - Compare two column values and get distinct values from first column 匹配列值并用''python替换重复项 - match column values and replace duplicates with ' ' python Python Pandas:比较两个 CSV 文件并通过匹配列从两个文件中删除行 - Python Pandas: Compare two CSV files and delete lines from both the file by matching a column 如何读取两个 CSV 文件并比较两者中的 1 列,然后写入列匹配的新文件 - How do I read two CSV files and compare 1 column from both and then write to a new file where columns match 比较两个 CSV 文件,如果三项匹配,则各减去一列 - Compare two CSV files and subtract a column of each if three things match Python比较两列 - Python Compare Two Column 加入两个数据帧并替换 Python 中的列值 - JOIN two DataFrames and replace Column values in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM