简体   繁体   English

如何优化数据框的重组代码

[英]How to optimize regrouping code for dataframe

I want to optimize code which regroup my pandas dataframe (dk) by joins:我想优化通过连接重新组合我的熊猫数据框(dk)的代码:

dk = pd.DataFrame({'Point': {0: 15, 1: 16, 2: 16, 3: 17, 4: 17, 5: 18, 6: 18, 7: 19, 8: 20},
                   'join': {0: 0, 1: 0, 2: 1, 3: 1, 4: 2, 5: 2, 6: 3, 7: 3, 8: 4}})

If there two groups with differense joins have one same point, set to both groups one join.如果有两个不同连接的组有一个相同的点,则设置为两个组一个连接。 And so for all dataframe.对于所有数据框也是如此。 I did it with simple code:我用简单的代码做到了:

dk['new'] = dk['join']
for i in dk.index:
    
    for j in range(i+1, dk.shape[0]):
        if dk['Point'][i] == dk['Point'][j]:
            dk['new'][j] = dk['join'][i]
            dk.loc[(dk['join'] == dk['join'][j]), 'new'] = dk['new'][i]   

Result that I want:我想要的结果:

df = {'Point': {0: 15, 1: 16, 2: 16, 3: 17, 4: 17, 5: 18, 6: 18, 7: 19, 8: 20},
 'join': {0: 0, 1: 0, 2: 1, 3: 1, 4: 2, 5: 2, 6: 3, 7: 3, 8: 4},
 'new': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 4}}

But I need to release it for big data which has more than 450k rows.但我需要为超过 450k 行的大数据发布它。 Do you have any idea how to optimize it or other modules for this problem?您知道如何针对此问题优化它或其他模块吗? (Beforehand thanks) (预先感谢)

You can iterate over the sub-df grouped by 'join' and increment new when the intersection with the previous 'Point' values is empty (I don't know if 'Point' is always increasing but that would cover the case where it's not):您可以遍历按“加入”分组的子 df 并在与先前“点”值的交集为空时递增new (我不知道“点”是否总是在增加,但这将涵盖它不是的情况):

df = pd.DataFrame()
new = None
point_set = {}
for j, sub_df in dk.groupby('join'):
    if new == None or not set(sub_df['Point']).intersection(point_set):
        new = j

    point_set = set(sub_df['Point'])
    sub_df['new'] = new
    df = pd.concat([df, sub_df])
    
print(df)

Output:输出:

   Point  join  new
0     15     0    0
1     16     0    0
2     16     1    0
3     17     1    0
4     17     2    0
5     18     2    0
6     18     3    0
7     19     3    0
8     20     4    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM