簡體   English   中英

從數組創建數據框

[英]Create dataframe from array

我有以下形式的數據:

[('06/03/2018 17.35.18.211', 'param_a', 1),
 ('06/03/2018 17.35.19.211', 'param_b', 1),
 ('06/03/2018 17.35.20.211', 'param_c', 1),
 ('06/03/2018 17.35.21.211', 'param_a', 2),
 ('06/03/2018 17.35.22.211', 'param_b', 2),
 ('06/03/2018 17.35.22.211', 'param_c', 2)]

用它來創建如下所示的數據框的最佳方法是什么:

                 timestamp   param_a   param_b   param_C
0  06/03/2018 17.35.18.211       1.0       NaN       NaN
1  06/03/2018 17.35.19.211       NaN       1.0       NaN
2  06/03/2018 17.35.20.211       NaN       NaN       1.0
3  06/03/2018 17.35.21.211       2.0       NaN       NaN
4  06/03/2018 17.35.22.211       NaN       2.0       2.0

使用DataFrame構造器與pivotrename_axisreset_index

arr = [('06/03/2018 17.35.18.211', 'param_a', 1),
 ('06/03/2018 17.35.19.211', 'param_b', 1),
 ('06/03/2018 17.35.20.211', 'param_c', 1),
 ('06/03/2018 17.35.21.211', 'param_a', 2),
 ('06/03/2018 17.35.22.211', 'param_b', 2),
 ('06/03/2018 17.35.23.211', 'param_c', 2)]

df = pd.DataFrame(arr, columns=['timestamp','b','c'])
df = df.pivot('timestamp','b','c').rename_axis(None, axis=1).reset_index()
print (df)
                 timestamp  param_a  param_b  param_c
0  06/03/2018 17.35.18.211      1.0      NaN      NaN
1  06/03/2018 17.35.19.211      NaN      1.0      NaN
2  06/03/2018 17.35.20.211      NaN      NaN      1.0
3  06/03/2018 17.35.21.211      2.0      NaN      NaN
4  06/03/2018 17.35.22.211      NaN      2.0      NaN
5  06/03/2018 17.35.23.211      NaN      NaN      2.0

但是,如果第一和第二個值重復,則必須進行匯總

您也可以嘗試一下。 (請注意, get_dummies可能很慢)

arr = [('06/03/2018 17.35.18.211', 'param_a', 1),
 ('06/03/2018 17.35.19.211', 'param_b', 1),
 ('06/03/2018 17.35.20.211', 'param_c', 1),
 ('06/03/2018 17.35.21.211', 'param_a', 2),
 ('06/03/2018 17.35.22.211', 'param_b', 2),
 ('06/03/2018 17.35.23.211', 'param_c', 2)]
df = pd.DataFrame(arr)
pd.concat([df[0], df[2].values[:,None] * df[1].str.get_dummies()], axis=1)

    0                   param_a param_b param_c
0   06/03/2018 17.35.18.211 1   0   0
1   06/03/2018 17.35.19.211 0   1   0
2   06/03/2018 17.35.20.211 0   0   1
3   06/03/2018 17.35.21.211 2   0   0
4   06/03/2018 17.35.22.211 0   2   0
5   06/03/2018 17.35.23.211 0   0   2

要么

v = df[1].str.get_dummies()
pd.concat([df[0], df[2].values[:,None] * v.where(v>0)], axis=1)


    0                   param_a param_b param_c
0   06/03/2018 17.35.18.211 1.0 NaN NaN
1   06/03/2018 17.35.19.211 NaN 1.0 NaN
2   06/03/2018 17.35.20.211 NaN NaN 1.0
3   06/03/2018 17.35.21.211 2.0 NaN NaN
4   06/03/2018 17.35.22.211 NaN 2.0 NaN
5   06/03/2018 17.35.23.211 NaN NaN 2.0

您正在嘗試從3列數據創建具有4列的數據框。 如果需要4列,則必須重新格式化數據。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM