使用 pandas 連接兩個數據框列

Question

我是 python 開發的新手。 在這里，我有以下 dataframe

Document_ID OFFSET  PredictedFeature  word

    0         0            2000       Mark
    0         8            2000       Bob
    0         16           2200       AL
    0         23           2200       NS
    0         30           2200       GK
    1          0            2100      sandy
    1          5            2100      Rohan
    1          7            2100      DV

這里 DOcument ID 是您可以說 I 的關鍵。

在這里，我想做的是制作一個文件，在該文件中我會看到類似的結果

mark 2000, Bob 2000, AL 2200, NS 2200, GK 2200, sandy 2100, 2100 Rohan, 2100 DV

我嘗試使用該組

df = df.groupby('Document_ID').agg(lambda x: ','.join(x))
for name in df.index:
    print name
    print df.loc[name]

我也試圖將其保存為文本或 csv 格式文件。

誰能幫我這個？

Answer 1

使用DataFrame.stack ：

new_df=df[['word','PredictedFeature']].stack().to_frame().T
new_df.columns=new_df.columns.droplevel(0)
print(new_df)

   word PredictedFeature word PredictedFeature word PredictedFeature word  \
0  Mark             2000  Bob             2000   AL             2200   NS   

  PredictedFeature word PredictedFeature   word PredictedFeature   word  \
0             2200   GK             2200  sandy             2100  Rohan   

  PredictedFeature word PredictedFeature  
0             2100   DV             2100

但是如果要保留rest的信息最好使用pivot_table

new_df=df.pivot_table(columns=['word','PredictedFeature'],index='Document_ID',values='OFFSET',fill_value=0)
print(new_df)

word               AL  Bob   DV   GK Mark   NS Rohan sandy
PredictedFeature 2200 2000 2100 2200 2000 2200  2100  2100
Document_ID                                               
0                  16    8    0   30    0   23     0     0
1                   0    0    7    0    0    0     5     0

要保存它，您需要DataFrame.to_csv ：

new_df.to_csv('mycsv.csv')

如果它是多索引，您需要：

new_df.to_csv('mycsv.csv',index_label=['word','PredictedFeature'])

閱讀它pd.read_csv ：

new_read_csv=pd.read_csv('mycsv.csv')

使用 pandas 連接兩個數據框列

問題描述

1 個解決方案

解決方案1
0 2019-10-16 09:40:22

使用 pandas 連接兩個數據框列

問題描述

1 個解決方案

解決方案1 0 2019-10-16 09:40:22

解決方案1
0 2019-10-16 09:40:22