划分python pandas DataFrame的行

Question

I have a pandas DataFrame df like this 我有一个像这样的pandas DataFrame df

   mat  time
0  101   20
1  102    7
2  103   15

I need to divide the rows so the column of time doesn't have any values higher than t=10 to have something like this 我需要对行进行划分，以使时间列不具有大于t=10任何值，这样

   mat  time
0  101   10
2  101   10
3  102    7
4  103   10
5  103    5

the index doesn't matter 索引没关系

If I'd use groupby('mat')['time'].sum() on this df I would have the original df , but I need like an inverse of the groupby func. 如果要在此df上使用groupby（'mat'）['time']。sum（），我将拥有原始df ，但我需要像groupby函数的逆函数。

Is there any way to get the ungrouped DataFrame with the condition of time <= t ? 有什么方法可以在time <= t的条件下获取ungrouped DataFrame吗？

I'm trying to use a loop here but it's kind of 'unPythonic', any ideas? 我在这里尝试使用一个循环，但是有点像“ unPythonic”，有什么想法吗？

Answer 1

Use an apply function that loops until all are less than 10. 使用循环直到所有小于10的apply函数。

def split_max_time(df):
    new_df = df.copy()
    while new_df.iloc[-1, -1] > 10:
        temp = new_df.iloc[-1, -1]
        new_df.iloc[-1, -1] = 10
        new_df = pd.concat([new_df, new_df])
        new_df.iloc[-1, -1] = temp - 10
    return new_df


print df.groupby('mat', group_keys=False).apply(split_max_time)

   mat  time
0  101    10
0  101    10
1  102     7
2  103    10
2  103     5

Answer 2

You could .groupby('mat') and .apply() a combination of integer division and modulo operation using the cutoff ( 10 ) to decompose each time value into the desired components: 您可以使用cutoff （ 10 ）将.groupby('mat')和.apply()进行integer除法和modulo运算的组合，以将每个time值分解为所需的分量：

cutoff = 10
def decompose(time):
    components = [cutoff for _ in range(int(time / cutoff))] + [time.iloc[0] % cutoff]
    return pd.Series([c for c in components if c > 0])

df.groupby('mat').time.apply(decompose).reset_index(-1, drop=True)

to get: 要得到：

In case you care about performance: 如果您关心性能：

%timeit df.groupby('mat', group_keys=False).apply(split_max_time)
100 loops, best of 3: 4.21 ms per loop

%timeit df.groupby('mat').time.apply(decompose).reset_index(-1, drop=True)
1000 loops, best of 3: 1.83 ms per loop

划分python pandas DataFrame的行

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-06-01 00:15:09

解决方案2
1 2016-06-01 04:05:18

划分python pandas DataFrame的行

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-06-01 00:15:09

解决方案2 1 2016-06-01 04:05:18

解决方案1
1 已采纳 2016-06-01 00:15:09

解决方案2
1 2016-06-01 04:05:18