简体   繁体   English

基于 dataframe 生成 3 个不同的列

[英]Generating 3 different columns based on dataframe

I have a dataframe:我有一个 dataframe:

Date_1      Date_2  individual_count
01/09/2019  02/08/2019  2
01/09/2019  03/08/2019  2
01/09/2019  04/08/2019  2
01/09/2019  05/08/2019  2
.   .   .
01/09/2019  28/08/2019  10
01/09/2019  29/08/2019  11
01/09/2019  30/08/2019  12
01/09/2019  31/08/2019  14

I want to generate 3 columns, num_days_2, num_days_3, num_days_5, num_days_20我想生成 3 列,num_days_2、num_days_3、num_days_5、num_days_20

I want to aggregate the dataset in such a way that:我想以如下方式聚合数据集:

num_days_2 : all individual_count aggregated for date_1 for date_2 = (date_2- 2, date_2- 1)
num_days_3 : all individual_count aggregated for date_1 for date_2 = (date_2- 5, date_2- 3)
num_days_5 : all individual_count aggregated for date_1 for date_2 = (date_2- 6, date_2- 10)
num_days_20 : all individual_count aggregated for date_1 for date_2 = left all dates

for example, for particualar date_1 = 01/09/2019:例如,对于特定的 date_1 = 01/09/2019:

num_days_2 = sum of individual counts for date_2 = 30/08/2019 - 31/08/2019 
num_days_3 = sum of individual counts for date_2 = 27/08/2019 - 29/08/2019 
num_days_5 = sum of individual counts for date_2 = 26/08/2019 - 22/08/2019 
num_days_20 = sum of individual counts for date_2 = 25/08/2019 - 02/08/2019 

EDIT编辑

Expected output:预期 output:

Date_1      num_days_2  num_days_3  num_days_5  num_days_20
01/09/2019   
02/09/2019
.
.
.
30/09/2019

Can anyone in achieving the same.任何人都可以实现相同的目标。

I have created an example that you can work from.我创建了一个您可以使用的示例。 You will need to maybe rename the columns, and look into the cut function to get the bins correctly sorted.您可能需要重命名列,并查看cut的 function 以正确排序垃圾箱。

# Generate example data.
# This is just an way go generate data that can be used to simulate your data.
df = pd.DataFrame(
    data=dict(
      Date_1=pd.Timestamp('today'), # This is Date_1
      Date_2=pd.date_range(end=pd.Timestamp('today'), periods=25), # This is Date_2
      individual_count=range(25) # This is individual_count
    )
)

# Calculate an offset as integer days:
# For each day, calculate the differace in days between day Date1 and Date2
df['offset_timedelta'] = (df.Date_1 - df.Date_2)
# To make bining eaiser convert the datetime delta to ints.
df['offset'] = df['offset_timedelta'].dt.days.astype('int16')


# Create bins for each offset:
# Each row will be grouped into an interval. based on the list [1,2,5,10,1000]
# 1000 is just an upper bound to get "the rest"
df['bins'] = pd.cut(df['offset'], [1,2,5,10,1000], include_lowest=True)

# This groups on day1 and the bin, so that we can sum for each.
grouped = df.groupby(['Date_1','bins'])[['individual_count']].sum()

# The groupby gives and index of 'Date_1','bins'. This converts bins to columns instead of and index.
final = grouped.unstack()

Edit: renamed columns to make them more like the original problem.编辑:重命名列以使它们更像原始问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM