I have the following code:
for tup in unique_tuples:
user_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == tup[1])]
for friend in tup[2]:
friend_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == friend)]
if (friend_review.date - user_review.date) <= 62:
tup[2].remove(friend)
I'm extracting values from a list of tuples and matching them to values in a column from a dataframe, then masking the row where that value is equal to true.
The user_review_mask is one row, representing the review that the user made on a business. The friend_review mask is also one row, representing the review that the user's friend made.
tup[2] is a list of friend_ids of the user_id in tup[1]. So I am looping through each friend of a user, and then match that friend_id to his review for a business.
Essentially I am looking to see if, for 2 different reviews by 2 different users, the difference between the friend_review.date and the user_review.date is <= +2 months. If the difference isn't less than 2 months, I want to remove the friend_id from the tup[2] list.
Both the dates in both dataframes/rows are of the data type datetime64[ns], and each date is formatted as such "yyyy-mm-dd", so I thought I could easily subtract them to see if there was a difference of less than 2 months between reviews.
However, I keep getting the following error:
TypeError: invalid type comparison
It also mentions that Numpy does not like comparisons vs "None", which I'm also a bit confused about since I have no null values in my column.
UPDATE: SOLUTION Ended up appending to new list instead of deleting from current one, but this works.
#to append tuples
business_reviewer_and_influenced_reviewers = []
#loop through each user and create a single row df based on a match from the reviews df and our tuple values
for tup in unique_tuples:
user_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) &
(reviews_prior_to_influence_threshold.user_id == tup[1]), 'date']
user_review_date = user_review_date.values[0]
#loop through list each friend of the reviewer that also reviewed the business in tup[2]
for friend in tup[2]:
friend_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) &
(reviews_prior_to_influence_threshold.user_id == friend), 'date']
friend_review_date = friend_review_date.values[0]
diff = pd.to_timedelta(friend_review_date - user_review_date).days
#append business_id, reviewer, and influenced_reviewer as a tuple to a list
if (diff >= 0) and (diff <= 62):
business_reviewer_and_influenced_reviewers.append((tup[0], tup[1], friend))
The dates in your dataframe are likely not datetime64 dtype
instances, hence the invalid type comparison
.You can check with df.dtypes
.If that's true, use df.date = pd.to_datetime(df.date)
.
You likely have some dates in your dataframe that are null
, hence the comparisons vs. "None". Use df[pd.notnull(df.dates)]
.
BTW: Subtracting the dates should get you timedelta
so you'll likely need to do something like (friend_review.date - user_review.date).dt.days <= 62
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.