繁体   English   中英

计算 product_id [Pandas] 的两列之间的最小值

[英]Count min value between two columns for product_id [Pandas]

我想计算purchaseDateStoreA < purchaseDateStoreB的日期的实例,反之亦然

主DF

product_id    purchaseDateStoreA   purchaseDateStoreB  month
34935              2019-01-01        2019-01-03        Jan-2019
64545              2019-02-01        2019-02-02        Feb-2019
35556              2019-01-17        2019-01-16        Jan-2019

所需 output

first_arrival_timestamp_storeA
Jan-2019: 1
Feb-2019: 1

first_arrival_timestamp_storeB
Jan-2019: 1

到目前为止,这只是我为 storeA 尝试过的:

first_arrivals = df.assign(first_arrival_timestamp_storeA = df.groupby("product_id")["purchaseDateStoreA"].transform("min")).\
   query("purchaseDateStoreA == first_arrival_timestamp_storeA")
pd.pivot_table(first_arrivals, aggfunc="count", values=["product_id"])

我认为您需要比较值,然后按Series.value_counts计数并按字典重命名索引:

df['purchaseDateStoreA'] = pd.to_datetime(df['purchaseDateStoreA'])
df['purchaseDateStoreB'] = pd.to_datetime(df['purchaseDateStoreB'])

d = {True:'purchaseDateStoreA', False:'purchaseDateStoreB'}
df = (df['purchaseDateStoreA'] < df['purchaseDateStoreB']).value_counts().rename(d)
print (df)
purchaseDateStoreA    2
purchaseDateStoreB    1
dtype: int64

编辑:

df['purchaseDateStoreA'] = pd.to_datetime(df['purchaseDateStoreA'])
df['purchaseDateStoreB'] = pd.to_datetime(df['purchaseDateStoreB'])

d = {True:'purchaseDateStoreA', False:'purchaseDateStoreB'}

new = (df['purchaseDateStoreA'] < df['purchaseDateStoreB']).map(d)

df = df.groupby(['month', new.rename('stores')]).size().reset_index(name='count')
print (df)
      month              stores  count
0  Feb-2019  purchaseDateStoreA      1
1  Jan-2019  purchaseDateStoreA      1
2  Jan-2019  purchaseDateStoreB      1

很简单,两个过滤条件; 两个结果。 我用了一个 dict 结果

data = '''product_id    purchaseDateStoreA   purchaseDateStoreB
34935              2019-01-01        2019-01-03
64545              2019-02-01        2019-02-02
35556              2019-01-17        2019-01-16'''
da = [[i for i in re.split("[ ][ ]+", l)] for l in data.split("\n")]
df = pd.DataFrame(da[1:], columns=da[0])
df.purchaseDateStoreA = pd.to_datetime(df.purchaseDateStoreA)
df.purchaseDateStoreB = pd.to_datetime(df.purchaseDateStoreB)
{"first_arrival_timestamp_storeA":df[df.purchaseDateStoreA < df.purchaseDateStoreB]["product_id"].count(),
"first_arrival_timestamp_storeB":df[df.purchaseDateStoreB < df.purchaseDateStoreA]["product_id"].count()}

创建一列并有条件地使用商店名称为其赋予属性。 您可以使用np.where( if df.purchaseDateStoreA<df.purchaseDateStoreB, then;purchaseDateStoreA, otherwise;purchaseDateStoreB) 然后value_counts()商店

 import numpy as np
df.assign(holding=np.where(df.purchaseDateStoreA<df.purchaseDateStoreB,\
                           'purchaseDateStoreA','purchaseDateStoreB'))
df.holding.value_counts()



  purchaseDateStoreA    2
  purchaseDateStoreB    1
Name: holding, dtype: int64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM