简体   繁体   中英

Subtracting Two dates from columns in 2 dataframes pandas

I have the following code:

for tup in unique_tuples:
    user_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == tup[1])]     

    for friend in tup[2]:
        friend_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == friend)] 

        if (friend_review.date - user_review.date) <= 62:
            tup[2].remove(friend)

I'm extracting values from a list of tuples and matching them to values in a column from a dataframe, then masking the row where that value is equal to true.

The user_review_mask is one row, representing the review that the user made on a business. The friend_review mask is also one row, representing the review that the user's friend made.

tup[2] is a list of friend_ids of the user_id in tup[1]. So I am looping through each friend of a user, and then match that friend_id to his review for a business.

Essentially I am looking to see if, for 2 different reviews by 2 different users, the difference between the friend_review.date and the user_review.date is <= +2 months. If the difference isn't less than 2 months, I want to remove the friend_id from the tup[2] list.

Both the dates in both dataframes/rows are of the data type datetime64[ns], and each date is formatted as such "yyyy-mm-dd", so I thought I could easily subtract them to see if there was a difference of less than 2 months between reviews.

However, I keep getting the following error:

TypeError: invalid type comparison

It also mentions that Numpy does not like comparisons vs "None", which I'm also a bit confused about since I have no null values in my column.

UPDATE: SOLUTION Ended up appending to new list instead of deleting from current one, but this works.

#to append tuples
business_reviewer_and_influenced_reviewers = []

#loop through each user and create a single row df based on a match from the reviews df and our tuple values
for tup in unique_tuples:
    user_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                (reviews_prior_to_influence_threshold.user_id == tup[1]), 'date']     

    user_review_date = user_review_date.values[0]

    #loop through list each friend of the reviewer that also reviewed the business in tup[2]
    for friend in tup[2]:
        friend_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                      (reviews_prior_to_influence_threshold.user_id == friend), 'date']

        friend_review_date = friend_review_date.values[0]
        diff = pd.to_timedelta(friend_review_date - user_review_date).days

        #append business_id, reviewer, and influenced_reviewer as a tuple to a list
        if (diff >= 0) and (diff <= 62):
            business_reviewer_and_influenced_reviewers.append((tup[0], tup[1], friend))

The dates in your dataframe are likely not datetime64 dtype instances, hence the invalid type comparison . You can check with df.dtypes . If that's true, use df.date = pd.to_datetime(df.date) .

You likely have some dates in your dataframe that are null , hence the comparisons vs. "None". Use df[pd.notnull(df.dates)] .

BTW: Subtracting the dates should get you timedelta so you'll likely need to do something like (friend_review.date - user_review.date).dt.days <= 62 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM