简体   繁体   English

根据多个条件合并 Pandas Dataframe 行

[英]Merge Pandas Dataframe Rows based on multiple conditions

Hi I have a pandas df which contains dates and amounts.您好,我有一个 pandas df,其中包含日期和金额。

    Date    Amount  
0 10/02/22  1600       
1 10/02/22  150     
2 11/02/22  100       
3 11/02/22  800
4 11/02/22  125   

If an entry is one day later and less than 10% of any other entry I would like to sum the amounts and then take the earliest date.如果一个条目晚了一天并且少于任何其他条目的 10%,我想将金额相加然后取最早的日期。

So the df would look like:所以 df 看起来像:

Date    Amount  
0 10/02/22  1825       
1 10/02/22  150         
2 11/02/22  800 

I've tried creating threshold and then creating groups based on these conditions but this does not yield expected results.我试过创建阈值,然后根据这些条件创建组,但这并没有产生预期的结果。

threshold_selector =  (amount_difference < 0.1) & (date_difference == day)

Where day is a time delta of one day其中一天是一天的时间增量

groups = threshold_selector.cumsum()
dates= dates.groupby(groups).agg({'Amount':sum, 'Date': min})

The result is all rows joined into one.结果是所有行合并为一个。

I would approach this using a pivot .我会使用pivot来解决这个问题。

Sort the values with descending amount and pivot to have the largest value in the first column.对值进行降序排序,将 pivot 的值放在第一列中。 Then find the values lower or equal to 10% that and mask them + add to first column.然后找到低于或等于 10% 的值并屏蔽它们 + 添加到第一列。 Then shape back to original:然后变回原来的形状:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.sort_values(by=['Date', 'Amount'], ascending=[True, False])

# pivot to have col0 with the largest value per day
df2 = (df
 .assign(col=df.groupby('Date').cumcount())
 .pivot(index='Date', columns='col', values='Amount')
)

# identify values lower than the 10% of the previous day's max
mask = df2.div(df2[0].shift(1, freq='D'), axis=0).le(0.1).reindex_like(df2)

# add the lower than 10% values
df2[0] += df2.where(mask).sum(axis=1).shift(-1, 'D').reindex(mask.index, fill_value=0)

# mask them
df2 = df2.mask(mask)

# reshape back dropping the NaNs
df2 = df2.stack().droplevel('col').reset_index(name='Amount')

output: output:

        Date  Amount
0 2022-02-10  1825.0
1 2022-02-10   150.0
2 2022-02-11   800.0

Here is an alternative using a groupby approach:这是使用groupby方法的替代方法:

# ensure datetime
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

# group Amounts by Date
g = df.groupby('Date')['Amount']

# get max amount per date
date_max = g.max()
# shift to previous date
prev_date_max = date_max.shift(1, freq='D').reindex(date_max.index, fill_value=0)

# identify rows to drop later
mask = df['Amount'].div(df['Date'].map(prev_date_max)).le(0.1)

# get value of next day to add to max
val_to_add = (df['Amount'][mask]
                 .groupby(df['Date']).sum()
                 .shift(-1, freq='D')
                )

# add to max
df['Amount'] += df['Date'].map(val_to_add).where(df.index.isin(g.idxmax())).fillna(0)

# drop rows
df = df.loc[~mask]

output: output:

        Date  Amount
0 2022-02-10  1825.0
1 2022-02-10   150.0
3 2022-02-11   800.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据这些条件“合并” Pandas DataFrame 中的行 - How can I “merge” rows in a Pandas DataFrame based on these conditions Python 根据多个条件将 dataframe 的行添加/合并在一起 - Python add / merge rows of a dataframe together based on multiple conditions 基于多列值合​​并 Pandas Dataframe 中的多行 - Merge multiple rows in pandas Dataframe based on multiple column values 在 Pandas 数据框中在多个条件下(基于 2 列)删除行 - Drop rows on multiple conditions (based on 2 column) in pandas dataframe 根据熊猫数据框中的条件将单元格拆分/分解为多行 - Split/explode cells into multiple rows based on conditions in pandas dataframe 熊猫基于多个条件从数据框中删除行,而没有for循环 - pandas remove rows from dataframe based on multiple conditions without for loops 根据多个条件计算行并添加到熊猫数据框中的列表 - Counting rows based on multiple conditions and add to list in pandas dataframe 有没有更好的方法来基于多个条件从 pandas DataFrame 行 select 行? - Is there a better way to select rows from a pandas DataFrame based on multiple conditions? 基于跨列的多个条件在 Pandas 数据框中高效选择行 - Efficient selection of rows in Pandas dataframe based on multiple conditions across columns 根据多个条件从 pandas dataframe 中删除具有 NaN 的行 - Drop rows with NaNs from pandas dataframe based on multiple conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM