在熊猫中合并两个数据框

Question

I am merging two csv(data frame) using below code: 我正在使用下面的代码合并两个csv（数据帧）：

import pandas as pd
a = pd.read_csv(file1,dtype={'student_id': str})
df = pd.read_csv(file2)
c=pd.merge(a,df,on='test_id',how='left')
c.to_csv('test1.csv', index=False)

I have the following CSV files 我有以下CSV文件

file1: 文件1：

test_id, student_id
1, 01990
2, 02300
3, 05555

file2: 文件2：

test_id, result
1, pass
3, fail

after merge 合并后

test_id, student_id , result
1, 1990, pass
2, 2300,
3, 5555, fail

If you notice student_id has 0 appended at the beginning and it's supposed to be considered as text but after merging and using to_csv function it converts it into numeric and removes leading 0. 如果您注意到student_id的开头附加了0，应该将其视为文本，但是在合并并使用to_csv函数后，它将其转换为数字并删除了前导0。

How can I keep the column as "text" even after to_csv? 即使在to_csv之后，如何将列保持为“文本”？

I think its to_csv function which saves back again as numeric Added dtype={'student_id': str} while reading csv.. but while saving it as to_csv .. it again convert it to numeric 我认为它的to_csv函数可以再次保存为数字添加了dtype = {'student_id'：str}，同时读取了csv ..但同时将其另存为to_csv ..再次将其转换为数字

Answer 1

It's not dropping the leading zero on the merge , it's dropping it on the read_csv . 它不会在merge删除前导零，而是在read_csv删除它。 You can fix this by specifying that column is a string at import time: 您可以通过在导入时将列指定为字符串来解决此问题：

a = pd.read_csv('file1.csv', dtype={'student_id': str}, skipinitialspace=True)

The important part is the dtype parameter. 重要的部分是dtype参数。 You are telling pandas to import this column as a string. 您正在告诉熊猫将此列作为字符串导入。 The skipinitialspace parameter is set to True, because the column headers are defined with spaces, so we strip it: skipinitialspace参数设置为True，因为列标题是用空格定义的，所以我们将其剥离：

test_id, student_id
        ^ The student_id starts here, at the space

The final code looks like this: 最终代码如下所示：

a = pd.read_csv('file1.csv', dtype={'student_id': str}, skipinitialspace=True)
df = pd.read_csv('file2.csv')
results = a.merge(df, how='left', on='test_id')

With the results dataframe looking like this: results数据帧如下所示：

    test_id     student_id  result
0   1           01990       pass
1   2           02300       NaN
2   3           05555       fail

Then when you run to_csv your result should be: 然后，当您运行to_csv结果应为：

test_id,student_id, result
1,01990, pass
2,02300,
3,05555, fail

Answer 2

Caveat Please use merge or join . 请注意，请使用merge或join 。 This answer is provided to give perspective on the flexibility pandas gives you and how many different ways there are to answer the same question. 提供此答案的目的是为了让您更直观地了解熊猫所提供的灵活性，以及有多少种不同的方式来回答同一问题。

a = pd.read_csv('file1.csv', converters=dict(student_id=str), skipinitialspace=True)
df = pd.read_csv('file2.csv')
results = pd.concat(
    [d.set_index('test_id') for d in [a, df]],
    axis=1, join='outer'
).reset_index()

Answer 3

Solution with join , first need read_csv with parameter dtype for convert student_id to string and remove whitespaces by skipinitialspace : 解决方案与join ，首先需要read_csv与参数dtype的转换student_id以string由和删除空格skipinitialspace ：

df1 = pd.read_csv(file1, dtype={'student_id': str}, skipinitialspace=True)
df2 = pd.read_csv(file2, skipinitialspace=True)

df = df1.join(df2.set_index('test_id'), on='test_id')
print (df)
   test_id student_id  result
0        1      01990    pass
1        2      02300     NaN
2        3      05555    fail

Answer 4

a = pd.read_csv(file1, dtype={'test_id': object})
df = pd.read_csv(file2, dtype={'test_id': object})

============================================================== ================================================== ============

In[28]: pd.merge(a, b, on='test_id', how='left')
Out[28]: 
  test_id   student_id  result
0      01         1990    pass
1      02         2300     NaN
2     003         5555    fail

在熊猫中合并两个数据框

问题描述

4 个解决方案

解决方案1
2 2017-04-07 04:41:10

解决方案2
2 2017-04-07 05:19:50

解决方案3
1 2017-04-07 05:14:19

解决方案4
0 已采纳 2017-04-07 04:36:43

在熊猫中合并两个数据框

问题描述

4 个解决方案

解决方案1 2 2017-04-07 04:41:10

解决方案2 2 2017-04-07 05:19:50

解决方案3 1 2017-04-07 05:14:19

解决方案4 0 已采纳 2017-04-07 04:36:43

解决方案1
2 2017-04-07 04:41:10

解决方案2
2 2017-04-07 05:19:50

解决方案3
1 2017-04-07 05:14:19

解决方案4
0 已采纳 2017-04-07 04:36:43