简体   繁体   English

从2个数据帧熊猫的列中减去两个日期

[英]Subtracting Two dates from columns in 2 dataframes pandas

I have the following code: 我有以下代码:

for tup in unique_tuples:
    user_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == tup[1])]     

    for friend in tup[2]:
        friend_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == friend)] 

        if (friend_review.date - user_review.date) <= 62:
            tup[2].remove(friend)

I'm extracting values from a list of tuples and matching them to values in a column from a dataframe, then masking the row where that value is equal to true. 我正在从元组列表中提取值,并将其与数据框的一列中的值匹配,然后在该值等于true的行中进行屏蔽。

The user_review_mask is one row, representing the review that the user made on a business. user_review_mask是一行,代表用户对企业进行的评论。 The friend_review mask is also one row, representing the review that the user's friend made. friend_review掩码也是一行,代表用户的朋友进行的评论。

tup[2] is a list of friend_ids of the user_id in tup[1]. tup [2]是tup [1]中user_id的friend_id的列表。 So I am looping through each friend of a user, and then match that friend_id to his review for a business. 因此,我遍历了用户的每个朋友,然后将那个friend_id与他的业务评论进行匹配。

Essentially I am looking to see if, for 2 different reviews by 2 different users, the difference between the friend_review.date and the user_review.date is <= +2 months. 从本质上讲, 我希望查看对于2个不同用户的2个不同评论,friend_review.date和user_review.date之间的差异是否为<= +2个月。 If the difference isn't less than 2 months, I want to remove the friend_id from the tup[2] list. 如果相差不少于2个月,我想从tup [2]列表中删除friend_id。

Both the dates in both dataframes/rows are of the data type datetime64[ns], and each date is formatted as such "yyyy-mm-dd", so I thought I could easily subtract them to see if there was a difference of less than 2 months between reviews. 两个数据帧/行中的两个日期均为数据类型datetime64 [ns],并且每个日期的格式均设置为“ yyyy-mm-dd”,因此我想可以轻松地将它们相减,以查看两者之间是否存在较小的差异超过两次审核之间的间隔。

However, I keep getting the following error: 但是,我不断收到以下错误:

TypeError: invalid type comparison

It also mentions that Numpy does not like comparisons vs "None", which I'm also a bit confused about since I have no null values in my column. 它还提到Numpy不喜欢比较vs“ None”,由于列中没有空值,我对此也有些困惑。

UPDATE: SOLUTION Ended up appending to new list instead of deleting from current one, but this works. 更新:解决方案最终追加到新列表,而不是从当前列表中删除,但这可行。

#to append tuples
business_reviewer_and_influenced_reviewers = []

#loop through each user and create a single row df based on a match from the reviews df and our tuple values
for tup in unique_tuples:
    user_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                (reviews_prior_to_influence_threshold.user_id == tup[1]), 'date']     

    user_review_date = user_review_date.values[0]

    #loop through list each friend of the reviewer that also reviewed the business in tup[2]
    for friend in tup[2]:
        friend_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                      (reviews_prior_to_influence_threshold.user_id == friend), 'date']

        friend_review_date = friend_review_date.values[0]
        diff = pd.to_timedelta(friend_review_date - user_review_date).days

        #append business_id, reviewer, and influenced_reviewer as a tuple to a list
        if (diff >= 0) and (diff <= 62):
            business_reviewer_and_influenced_reviewers.append((tup[0], tup[1], friend))

The dates in your dataframe are likely not datetime64 dtype instances, hence the invalid type comparison . 数据框中的日期可能不是datetime64 dtype实例,因此invalid type comparison You can check with df.dtypes . 您可以使用df.dtypes进行检查。 If that's true, use df.date = pd.to_datetime(df.date) . 如果是这样,请使用df.date = pd.to_datetime(df.date)

You likely have some dates in your dataframe that are null , hence the comparisons vs. "None". 您的数据框中可能有一些日期为null ,因此比较与“无”。 Use df[pd.notnull(df.dates)] . 使用df[pd.notnull(df.dates)]

BTW: Subtracting the dates should get you timedelta so you'll likely need to do something like (friend_review.date - user_review.date).dt.days <= 62 . 顺便说一句:减去日期应该使您有时间timedelta因此您可能需要执行类似(friend_review.date - user_review.date).dt.days <= 62

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM