简体   繁体   English

划分python pandas DataFrame的行

[英]Divide rows of python pandas DataFrame

I have a pandas DataFrame df like this 我有一个像这样的pandas DataFrame df

   mat  time
0  101   20
1  102    7
2  103   15

I need to divide the rows so the column of time doesn't have any values higher than t=10 to have something like this 我需要对行进行划分,以使时间列不具有大于t=10任何值,这样

   mat  time
0  101   10
2  101   10
3  102    7
4  103   10
5  103    5

the index doesn't matter 索引没关系

If I'd use groupby('mat')['time'].sum() on this df I would have the original df , but I need like an inverse of the groupby func. 如果要在此df上使用groupby('mat')['time']。sum(),我将拥有原始df ,但我需要像groupby函数的逆函数。

Is there any way to get the ungrouped DataFrame with the condition of time <= t ? 有什么方法可以在time <= t的条件下获取ungrouped DataFrame吗?

I'm trying to use a loop here but it's kind of 'unPythonic', any ideas? 我在这里尝试使用一个循环,但是有点像“ unPythonic”,有什么想法吗?

Use an apply function that loops until all are less than 10. 使用循环直到所有小于10的apply函数。

def split_max_time(df):
    new_df = df.copy()
    while new_df.iloc[-1, -1] > 10:
        temp = new_df.iloc[-1, -1]
        new_df.iloc[-1, -1] = 10
        new_df = pd.concat([new_df, new_df])
        new_df.iloc[-1, -1] = temp - 10
    return new_df


print df.groupby('mat', group_keys=False).apply(split_max_time)

   mat  time
0  101    10
0  101    10
1  102     7
2  103    10
2  103     5

You could .groupby('mat') and .apply() a combination of integer division and modulo operation using the cutoff ( 10 ) to decompose each time value into the desired components: 您可以使用cutoff10 )将.groupby('mat').apply()进行integer除法和modulo运算的组合,以将每个time值分解为所需的分量:

cutoff = 10
def decompose(time):
    components = [cutoff for _ in range(int(time / cutoff))] + [time.iloc[0] % cutoff]
    return pd.Series([c for c in components if c > 0])

df.groupby('mat').time.apply(decompose).reset_index(-1, drop=True)

to get: 要得到:

mat
101    10
101    10
102     7
103    10
103     5

In case you care about performance: 如果您关心性能:

%timeit df.groupby('mat', group_keys=False).apply(split_max_time)
100 loops, best of 3: 4.21 ms per loop

%timeit df.groupby('mat').time.apply(decompose).reset_index(-1, drop=True)
1000 loops, best of 3: 1.83 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM