在熊貓中合並兩個數據框

Question

我正在使用下面的代碼合並兩個csv（數據幀）：

import pandas as pd
a = pd.read_csv(file1,dtype={'student_id': str})
df = pd.read_csv(file2)
c=pd.merge(a,df,on='test_id',how='left')
c.to_csv('test1.csv', index=False)

我有以下CSV文件

文件1：

test_id, student_id
1, 01990
2, 02300
3, 05555

文件2：

test_id, result
1, pass
3, fail

合並后

test_id, student_id , result
1, 1990, pass
2, 2300,
3, 5555, fail

如果您注意到student_id的開頭附加了0，應該將其視為文本，但是在合並並使用to_csv函數后，它將其轉換為數字並刪除了前導0。

即使在to_csv之后，如何將列保持為“文本”？

我認為它的to_csv函數可以再次保存為數字添加了dtype = {'student_id'：str}，同時讀取了csv ..但同時將其另存為to_csv ..再次將其轉換為數字

Answer 1

它不會在merge刪除前導零，而是在read_csv刪除它。 您可以通過在導入時將列指定為字符串來解決此問題：

a = pd.read_csv('file1.csv', dtype={'student_id': str}, skipinitialspace=True)

重要的部分是dtype參數。 您正在告訴熊貓將此列作為字符串導入。 skipinitialspace參數設置為True，因為列標題是用空格定義的，所以我們將其剝離：

test_id, student_id
        ^ The student_id starts here, at the space

最終代碼如下所示：

a = pd.read_csv('file1.csv', dtype={'student_id': str}, skipinitialspace=True)
df = pd.read_csv('file2.csv')
results = a.merge(df, how='left', on='test_id')

results數據幀如下所示：

    test_id     student_id  result
0   1           01990       pass
1   2           02300       NaN
2   3           05555       fail

然后，當您運行to_csv結果應為：

test_id,student_id, result
1,01990, pass
2,02300,
3,05555, fail

Answer 2

請注意，請使用merge或join 。 提供此答案的目的是為了讓您更直觀地了解熊貓所提供的靈活性，以及有多少種不同的方式來回答同一問題。

a = pd.read_csv('file1.csv', converters=dict(student_id=str), skipinitialspace=True)
df = pd.read_csv('file2.csv')
results = pd.concat(
    [d.set_index('test_id') for d in [a, df]],
    axis=1, join='outer'
).reset_index()

Answer 3

解決方案與join ，首先需要read_csv與參數dtype的轉換student_id以string由和刪除空格skipinitialspace ：

df1 = pd.read_csv(file1, dtype={'student_id': str}, skipinitialspace=True)
df2 = pd.read_csv(file2, skipinitialspace=True)

df = df1.join(df2.set_index('test_id'), on='test_id')
print (df)
   test_id student_id  result
0        1      01990    pass
1        2      02300     NaN
2        3      05555    fail

Answer 4

a = pd.read_csv(file1, dtype={'test_id': object})
df = pd.read_csv(file2, dtype={'test_id': object})

================================================== ============

In[28]: pd.merge(a, b, on='test_id', how='left')
Out[28]: 
  test_id   student_id  result
0      01         1990    pass
1      02         2300     NaN
2     003         5555    fail

在熊貓中合並兩個數據框

問題描述

4 個解決方案

解決方案1
2 2017-04-07 04:41:10

解決方案2
2 2017-04-07 05:19:50

解決方案3
1 2017-04-07 05:14:19

解決方案4
0 已采納 2017-04-07 04:36:43

在熊貓中合並兩個數據框

問題描述

4 個解決方案

解決方案1 2 2017-04-07 04:41:10

解決方案2 2 2017-04-07 05:19:50

解決方案3 1 2017-04-07 05:14:19

解決方案4 0 已采納 2017-04-07 04:36:43

解決方案1
2 2017-04-07 04:41:10

解決方案2
2 2017-04-07 05:19:50

解決方案3
1 2017-04-07 05:14:19

解決方案4
0 已采納 2017-04-07 04:36:43