[英]Python Taking the count of rows if the date (on 1 data frame) falls between two other dates (in a second data frame)
如果接触点的时间在辅助数据框上的一组两个日期之间,我正在寻找行数(由“总接触点”列表示)。即在两个日期之间发生了多少安装(df1) (df2). 我收到此错误:发生异常:ValueError 只能比较我的代码的安装部分的 SUM 上标记相同的系列对象。
例如:
df = '开始日期','结束日期' df2 = '事件日期','事件总数'
所需结果 = IF 事件日期 >= 开始日期 AND 事件日期 <= 结束日期,SUM(或 COUNT)事件总数
请看下面的代码:
import datetime
import pandas as pd
df_post_logs = pd.read_csv('logs_merged.csv',index_col=0)
df_installs = pd.read_csv('install_merge.csv',index_col=0)
'''Convert UTC to EST on Installs Add Column'''
df_installs['conversion date'] = pd.to_datetime(df_installs['conversion date'],infer_datetime_format='%Y-%m-%d')
df_installs['conversion time'] = pd.to_datetime(df_installs['conversion time'],infer_datetime_format='%H:%S:%M')
utc_datetime = df_installs['conversion time']
est_datetime = utc_datetime - datetime.timedelta(hours=5)
df_installs['utc datetime'] = utc_datetime
df_installs['est datetime'] = est_datetime
'''Add Column 10 Minutes Pre-Spot Time to Post Logs/10 Minutes Post Time to Spot'''
df_post_logs['Air Date'] = pd.to_datetime(df_post_logs['Air Date'],infer_datetime_format='%Y-%m-%d')
df_post_logs['Air Time'] = pd.to_datetime(df_post_logs['Air Time'],infer_datetime_format='%H:%S:%M')
timestamp = df_post_logs['Air Time']
df_post_logs['timestamp'] = timestamp
df_post_logs['pre spot time start'] = timestamp - datetime.timedelta(minutes=10, seconds=1)
df_post_logs['pre spot time end'] = timestamp - datetime.timedelta(seconds=1)
df_post_logs['post spot time'] = timestamp + datetime.timedelta(minutes=10)
'''SUM of Installs between pre-spot time'''
if df_installs['est datetime'] >= df_post_logs['pre spot time start'] and df_installs['est datetime'] <= df_post_logs['pre spot time end']:
pre_spot_installs = np.count(df_post_logs['install time'])
df_post_logs['pre spot installs'] = pre_spot_installs
'''SUM of Installs between post-spot time'''
if df_installs['est datetime'] >= df_post_logs['timestamp'] and df_installs['est datetime'] <= df_post_logs['post spot time']:
post_spot_installs = np.count(df_post_logs['install time'])
df_post_logs['post spot installs'] = post_spot_installs
'''Difference Between Post and Pre'''
if post_spot_installs - pre_spot_installs < 0:
incremental_visits = 0
else:
incremental_visits = post_spot_installs - pre_spot_installs
df_post_logs['incremental visits'] = incremental_visits
'''Multiply by TRP'''
lift = incremental_visits*df_post_logs['Dimension 5']
df_post_logs['lift'] = lift
'''Export to CSV'''
df_post_logs.to_csv("attribution.csv")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.