简体   繁体   English

如何解决:将BigQuery查询结果与列表进行比较

[英]How to fix: compairing result of a bigquery query to a list

I am a newbie in Python, I appreciate all help. 我是Python的新手,感谢所有帮助。 I want a list of activities of a group, who purchased for 4$ this way: purchase_date(1,1,0,1,1,0,1) where purchase_date is the date of the purchase, and the arrays indeces+1 gives the days after the purchases. 我想要一个小组的活动清单,该小组以4 $的价格购买了: purchase_date(1,1,0,1,1,0,1) ,其中purchase_datepurchase_date日期, arrays indeces+1给出了购买后的天数。 1 means active day, 0 means not active day. 1表示活动日,0表示不活动日。 Eg 20190203(1,1,1,0,0,0,1) means the purchase was on 2019-02-03 , and the user was active after that on 4th, 5th, 6th and 10th of February. 例如20190203(1,1,1,0,0,0,1)表示购买时间为2019-02-03 ,之后用户在2月4日,5日,6日和10日处于活动状态。

I tried the code below. 我尝试了下面的代码。 Steps: 脚步:

  1. Created a datatable with the purchases: four_dollar_buyers(user_pseudo_id,purchase_date) . 使用购买创建了一个数据表: four_dollar_buyers(user_pseudo_id,purchase_date) Queried it and loaded the result into the four_dollar_purchases list. 对其进行查询,并将结果加载到four_dollar_purchases列表中。
  2. Iterated over on four_dollar purchases four_dollar购买four_dollar
  3. Made 2 helper arrays: seven_days_date contains the dates after the purchases seven_days_number should contain ones and zeros (active or not in given day) 制作2个辅助数组: seven_days_date包含购买后的日期seven_days_number应包含一和零(在给定日期处于活动状态或不处于活动状态)
  4. Iterated over the seven_days_date , made a query from datatables of the given date, what gave the id-s of the active users on that day. 遍历seven_days_date ,从给定日期的数据seven_days_date中进行查询,从而得出当天活动用户的ID。 Load the result of the query into a list named ' actives ' 将查询结果加载到名为“ actives ”的列表中
  5. If the user_id of the given purchase is in actives, then the seven_days_number array should change from 0 to 1 on the given index. 如果给定购买的user_id处于活动状态,则seven_days_number数组应在给定索引上从0更改为1。
client = bigquery.Client(project="project")

QUERY = ('SELECT * FROM `project.four_dollar_buyers`')
query_job = client.query(QUERY)                             
four_dollar_purchases = list(query_job.result())                        

for row in four_dollar_purchases:                                       

  seven_days_date = ["","","","","","",""]                          
  seven_days_number = [0,0,0,0,0,0,0]                                   

  for i in range(7):
    date_time_obj = datetime.strptime(row[1], '%Y%m%d')                 
    date_time_obj = date_time_obj + timedelta(days=1)+timedelta(days=i)         
    seven_days_date[i] = date_time_obj.strftime("%Y%m%d")                   

  for idx, days in enumerate(seven_days_date):

    QUERY = ('''SELECT DISTINCT user_pseudo_id FROM 
    `project.events_'''+days+'''` WHERE event_name IN 
    ("activity_added")''')
    query_job = client.query(QUERY)
    actives = list(query_job.result())                          


  if row[0] in actives:                                 
    seven_days_number[idx] = 1                              


  print(row[1] + str(seven_days_number))

There is no error message anymore, but all result is like this 20181212(0,0,0,0,0,0,0) . 不再有错误消息,但是所有结果都像这样20181212(0,0,0,0,0,0,0) So for some reason the helper array does not change, after the purchase date it gives only zeros. 因此,由于某种原因,辅助数组不会更改,在购买日期之后,它仅给出零。 I checked the variables row[0] and actives with pprint and both of them contains the right result. 我检查了变量row [0]并使用pprint激活它们,它们都包含正确的结果。

Days is not an integer type as ralaxpy has suggested. 天不是ralaxpy建议的整数类型。 So, you can use enumeration or something else in order to modify the list using the index. 因此,您可以使用枚举或其他方式来使用索引修改列表。

query_job = client.query(QUERY)
actives = list(query_job.result())

for dict in actives:
  if dict[0] == row[0]:
    seven_days_number[idx] = 1

print(row[1] + str(seven_days_number))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM