I am doing Business Customer Segmentation. But when I run my code I am getting the error
unsupported operand type(s) for -: 'str' and 'str'
The error is located on this line of code:
# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: str(snapshot_date - x.max()).days ,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
Here is my entire program:
# Import The Libraries
# ! pip install xlrd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Import The Dataset
df = pd.read_csv('path/data.csv',encoding='latin1')
df = df[df['CustomerID'].notna()]
# Create TotalSum column
df_fix["TotalSum"] = df_fix["Quantity"] * df_fix["UnitPrice"]
# Sample the dataset
df_fix = df.sample(10000, random_state = 42)
# Convert to show date only
from datetime import datetime
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True).dt.strftime('%Y-%m-%d')
# Create date variable that records recency
import datetime
snapshot_date = max(df_fix.InvoiceDate)+str(datetime.timedelta(days=1))
# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
Please assist me
You should keep the datetime type when calculate
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True)
# Create date variable that records recency
snapshot_date = max(df_fix.InvoiceDate)+pd.Timedelta(days=1)
# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
'InvoiceNo': 'count',
'TotalSum': 'sum'})
Your snapshot_date
is no longer a datetime object, after your converted it into a string with the following line:
snapshot_date = max(df_fix.InvoiceDate)+str(datetime.timedelta(days=1))
You may check the output of your snapshot_date
with print(snapshot_date)
to figure out how you can convert it back to a datetime
object.
I have solved the problem by replacing this line of code:
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True).dt.strftime('%Y-%m-%d')
to this line of code:
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce')
The problem is now solved.
Thank you all of you for your help.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.