[英]Subtracting Two dates from columns in 2 dataframes pandas
I have the following code: 我有以下代码:
for tup in unique_tuples:
user_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == tup[1])]
for friend in tup[2]:
friend_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == friend)]
if (friend_review.date - user_review.date) <= 62:
tup[2].remove(friend)
I'm extracting values from a list of tuples and matching them to values in a column from a dataframe, then masking the row where that value is equal to true. 我正在从元组列表中提取值,并将其与数据框的一列中的值匹配,然后在该值等于true的行中进行屏蔽。
The user_review_mask is one row, representing the review that the user made on a business. user_review_mask是一行,代表用户对企业进行的评论。 The friend_review mask is also one row, representing the review that the user's friend made. friend_review掩码也是一行,代表用户的朋友进行的评论。
tup[2] is a list of friend_ids of the user_id in tup[1]. tup [2]是tup [1]中user_id的friend_id的列表。 So I am looping through each friend of a user, and then match that friend_id to his review for a business. 因此,我遍历了用户的每个朋友,然后将那个friend_id与他的业务评论进行匹配。
Essentially I am looking to see if, for 2 different reviews by 2 different users, the difference between the friend_review.date and the user_review.date is <= +2 months. 从本质上讲, 我希望查看对于2个不同用户的2个不同评论,friend_review.date和user_review.date之间的差异是否为<= +2个月。 If the difference isn't less than 2 months, I want to remove the friend_id from the tup[2] list. 如果相差不少于2个月,我想从tup [2]列表中删除friend_id。
Both the dates in both dataframes/rows are of the data type datetime64[ns], and each date is formatted as such "yyyy-mm-dd", so I thought I could easily subtract them to see if there was a difference of less than 2 months between reviews. 两个数据帧/行中的两个日期均为数据类型datetime64 [ns],并且每个日期的格式均设置为“ yyyy-mm-dd”,因此我想可以轻松地将它们相减,以查看两者之间是否存在较小的差异超过两次审核之间的间隔。
However, I keep getting the following error: 但是,我不断收到以下错误:
TypeError: invalid type comparison
It also mentions that Numpy does not like comparisons vs "None", which I'm also a bit confused about since I have no null values in my column. 它还提到Numpy不喜欢比较vs“ None”,由于列中没有空值,我对此也有些困惑。
UPDATE: SOLUTION Ended up appending to new list instead of deleting from current one, but this works. 更新:解决方案最终追加到新列表,而不是从当前列表中删除,但这可行。
#to append tuples
business_reviewer_and_influenced_reviewers = []
#loop through each user and create a single row df based on a match from the reviews df and our tuple values
for tup in unique_tuples:
user_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) &
(reviews_prior_to_influence_threshold.user_id == tup[1]), 'date']
user_review_date = user_review_date.values[0]
#loop through list each friend of the reviewer that also reviewed the business in tup[2]
for friend in tup[2]:
friend_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) &
(reviews_prior_to_influence_threshold.user_id == friend), 'date']
friend_review_date = friend_review_date.values[0]
diff = pd.to_timedelta(friend_review_date - user_review_date).days
#append business_id, reviewer, and influenced_reviewer as a tuple to a list
if (diff >= 0) and (diff <= 62):
business_reviewer_and_influenced_reviewers.append((tup[0], tup[1], friend))
The dates in your dataframe are likely not datetime64 dtype
instances, hence the invalid type comparison
.数据框中的日期可能不是 datetime64 dtype
实例,因此invalid type comparison
。You can check with df.dtypes
.您可以使用 df.dtypes
进行检查。If that's true, use df.date = pd.to_datetime(df.date)
.如果是这样,请使用 df.date = pd.to_datetime(df.date)
。
You likely have some dates in your dataframe that are null
, hence the comparisons vs. "None". 您的数据框中可能有一些日期为null
,因此比较与“无”。 Use df[pd.notnull(df.dates)]
. 使用df[pd.notnull(df.dates)]
。
BTW: Subtracting the dates should get you timedelta
so you'll likely need to do something like (friend_review.date - user_review.date).dt.days <= 62
. 顺便说一句:减去日期应该使您有时间timedelta
因此您可能需要执行类似(friend_review.date - user_review.date).dt.days <= 62
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.