[英]Count min value between two columns for product_id [Pandas]
我想计算purchaseDateStoreA
< purchaseDateStoreB
的日期的实例,反之亦然
主DF
product_id purchaseDateStoreA purchaseDateStoreB month
34935 2019-01-01 2019-01-03 Jan-2019
64545 2019-02-01 2019-02-02 Feb-2019
35556 2019-01-17 2019-01-16 Jan-2019
所需 output
first_arrival_timestamp_storeA
Jan-2019: 1
Feb-2019: 1
first_arrival_timestamp_storeB
Jan-2019: 1
到目前为止,这只是我为 storeA 尝试过的:
first_arrivals = df.assign(first_arrival_timestamp_storeA = df.groupby("product_id")["purchaseDateStoreA"].transform("min")).\
query("purchaseDateStoreA == first_arrival_timestamp_storeA")
pd.pivot_table(first_arrivals, aggfunc="count", values=["product_id"])
我认为您需要比较值,然后按Series.value_counts
计数并按字典重命名索引:
df['purchaseDateStoreA'] = pd.to_datetime(df['purchaseDateStoreA'])
df['purchaseDateStoreB'] = pd.to_datetime(df['purchaseDateStoreB'])
d = {True:'purchaseDateStoreA', False:'purchaseDateStoreB'}
df = (df['purchaseDateStoreA'] < df['purchaseDateStoreB']).value_counts().rename(d)
print (df)
purchaseDateStoreA 2
purchaseDateStoreB 1
dtype: int64
编辑:
df['purchaseDateStoreA'] = pd.to_datetime(df['purchaseDateStoreA'])
df['purchaseDateStoreB'] = pd.to_datetime(df['purchaseDateStoreB'])
d = {True:'purchaseDateStoreA', False:'purchaseDateStoreB'}
new = (df['purchaseDateStoreA'] < df['purchaseDateStoreB']).map(d)
df = df.groupby(['month', new.rename('stores')]).size().reset_index(name='count')
print (df)
month stores count
0 Feb-2019 purchaseDateStoreA 1
1 Jan-2019 purchaseDateStoreA 1
2 Jan-2019 purchaseDateStoreB 1
很简单,两个过滤条件; 两个结果。 我用了一个 dict 结果
data = '''product_id purchaseDateStoreA purchaseDateStoreB
34935 2019-01-01 2019-01-03
64545 2019-02-01 2019-02-02
35556 2019-01-17 2019-01-16'''
da = [[i for i in re.split("[ ][ ]+", l)] for l in data.split("\n")]
df = pd.DataFrame(da[1:], columns=da[0])
df.purchaseDateStoreA = pd.to_datetime(df.purchaseDateStoreA)
df.purchaseDateStoreB = pd.to_datetime(df.purchaseDateStoreB)
{"first_arrival_timestamp_storeA":df[df.purchaseDateStoreA < df.purchaseDateStoreB]["product_id"].count(),
"first_arrival_timestamp_storeB":df[df.purchaseDateStoreB < df.purchaseDateStoreA]["product_id"].count()}
创建一列并有条件地使用商店名称为其赋予属性。 您可以使用np.where( if df.purchaseDateStoreA<df.purchaseDateStoreB, then;purchaseDateStoreA, otherwise;purchaseDateStoreB)
。 然后value_counts()
商店
import numpy as np
df.assign(holding=np.where(df.purchaseDateStoreA<df.purchaseDateStoreB,\
'purchaseDateStoreA','purchaseDateStoreB'))
df.holding.value_counts()
purchaseDateStoreA 2
purchaseDateStoreB 1
Name: holding, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.