繁体   English   中英

将列表的熊猫数据框转换为数据框的字典

[英]Convert pandas dataframe of lists to dict of dataframes

我有一个数据框(带有DateTime索引),其中某些列包含列表,每个列表包含6个元素。

In: dframe.head()
Out: 
                           A                                        B  \
timestamp                                                                
2017-05-01 00:32:25        30  [-3512, 375, -1025, -358, -1296, -4019]   
2017-05-01 00:32:55        30  [-3519, 372, -1026, -361, -1302, -4020]   
2017-05-01 00:33:25        30  [-3514, 371, -1026, -360, -1297, -4018]   
2017-05-01 00:33:55        30  [-3517, 377, -1030, -363, -1293, -4027]   
2017-05-01 00:34:25        30  [-3515, 372, -1033, -361, -1299, -4025]   
                                                      C           D
timestamp                                                             
2017-05-01 00:32:25  [1104, 1643, 625, 1374, 5414, 2066]      49.93   
2017-05-01 00:32:55  [1106, 1643, 622, 1385, 5441, 2074]      49.94   
2017-05-01 00:33:25  [1105, 1643, 623, 1373, 5445, 2074]      49.91   
2017-05-01 00:33:55  [1105, 1646, 620, 1384, 5438, 2076]      49.91   
2017-05-01 00:34:25  [1104, 1645, 613, 1374, 5431, 2082]      49.94   

我有一个字典dict_of_dfs ,我想用6个数据帧填充,

dict_of_dfs = {1: df1, 2:df2, 3:df3, 4:df4, 5:df5, 6:df6}

第ith数据帧包含每个列表中的ith项,因此dict中的第一个数据帧为:

In:df1
Out: 
                            A          B      C        D
    timestamp                                                                
    2017-05-01 00:32:25        30  -3512   1104    49.93
    2017-05-01 00:32:55        30  -3519   1106    49.94
    2017-05-01 00:33:25        30  -3514   1105    49.91
    2017-05-01 00:33:55        30  -3517   1105    49.91
    2017-05-01 00:34:25        30  -3515   1104    49.94

等等。 实际的数据框具有比此更多的列和数千行。 进行转换的最简单,最python方法是什么?

您可以使用字典理解与assign以及对选择值lists使用str[0] str[1]

N = 6
dfs = {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N + 1)}

print(dfs[1])
             timestamp   A     B     C      D
0  2017-05-01 00:32:25  30 -3512  1104  49.93
1  2017-05-01 00:32:55  30 -3519  1106  49.94
2  2017-05-01 00:33:25  30 -3514  1105  49.91
3  2017-05-01 00:33:55  30 -3517  1105  49.91
4  2017-05-01 00:34:25  30 -3515  1104  49.94

另一个解决方案:

dfs = {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)}

print(dfs[1])
             timestamp   A     B     C      D
0  2017-05-01 00:32:25  30 -3512  1104  49.93
1  2017-05-01 00:32:55  30 -3519  1106  49.94
2  2017-05-01 00:33:25  30 -3514  1105  49.91
3  2017-05-01 00:33:55  30 -3517  1105  49.91
4  2017-05-01 00:34:25  30 -3515  1104  49.94

时间

df = pd.concat([df]*10000).reset_index(drop=True)

In [185]: %timeit {i:df.assign(B=df['B'].str[i-1], C=df['C'].str[i-1]) for i in range(1,N+1)}
1 loop, best of 3: 420 ms per loop

In [186]: %timeit {i:df.apply(lambda x: x.str[i-1] if type(x.iat[0]) == list else x) for i in range(1,7)}
1 loop, best of 3: 447 ms per loop

In [187]: %timeit {(i+1):df.applymap(lambda x: x[i] if type(x) == list else x) for i in range(6)}
1 loop, best of 3: 881 ms per loop

设定

df = pd.DataFrame({'A': {'2017-05-01 00:32:25': 30,
  '2017-05-01 00:32:55': 30,
  '2017-05-01 00:33:25': 30,
  '2017-05-01 00:33:55': 30,
  '2017-05-01 00:34:25': 30},
 'B': {'2017-05-01 00:32:25': [-3512, 375, -1025, -358, -1296, -4019],
  '2017-05-01 00:32:55': [-3519, 372, -1026, -361, -1302, -4020],
  '2017-05-01 00:33:25': [-3514, 371, -1026, -360, -1297, -4018],
  '2017-05-01 00:33:55': [-3517, 377, -1030, -363, -1293, -4027],
  '2017-05-01 00:34:25': [-3515, 372, -1033, -361, -1299, -4025]},
 'C': {'2017-05-01 00:32:25': [1104, 1643, 625, 1374, 5414, 2066],
  '2017-05-01 00:32:55': [1106, 1643, 622, 1385, 5441, 2074],
  '2017-05-01 00:33:25': [1105, 1643, 623, 1373, 5445, 2074],
  '2017-05-01 00:33:55': [1105, 1646, 620, 1384, 5438, 2076],
  '2017-05-01 00:34:25': [1104, 1645, 613, 1374, 5431, 2082]},
 'D': {'2017-05-01 00:32:25': 49.93,
  '2017-05-01 00:32:55': 49.94,
  '2017-05-01 00:33:25': 49.1,
  '2017-05-01 00:33:55': 49.91,
  '2017-05-01 00:34:25': 49.94}})

使用dict理解构建df dict。 子df使用applymap函数生成。 它可以转换具有6个元素的列表的所有列:

dict_of_dfs = {(i+1):df.applymap(lambda x: x[i] if type(x) == list else x) for i in range(6)}

print(dict_of_dfs[1])
                      A     B     C      D
2017-05-01 00:32:25  30 -3512  1104  49.93
2017-05-01 00:32:55  30 -3519  1106  49.94
2017-05-01 00:33:25  30 -3514  1105  49.10
2017-05-01 00:33:55  30 -3517  1105  49.91
2017-05-01 00:34:25  30 -3515  1104  49.94


print(dict_of_dfs[2])
                      A    B     C      D
2017-05-01 00:32:25  30  375  1643  49.93
2017-05-01 00:32:55  30  372  1643  49.94
2017-05-01 00:33:25  30  371  1643  49.10
2017-05-01 00:33:55  30  377  1646  49.91
2017-05-01 00:34:25  30  372  1645  49.94

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM