简体   繁体   中英

Why am I getting unsupported operand type(s) for -: 'str' and 'str' error

I am doing Business Customer Segmentation. But when I run my code I am getting the error

unsupported operand type(s) for -: 'str' and 'str'

The error is located on this line of code:

  # Aggregate data by each customer
    customers = df_fix.groupby(['CustomerID']).agg({
        'InvoiceDate': lambda x: str(snapshot_date - x.max()).days ,
        'InvoiceNo': 'count',
        'TotalSum': 'sum'})

Here is my entire program:

# Import The Libraries
# ! pip install xlrd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import The Dataset
df = pd.read_csv('path/data.csv',encoding='latin1')
df = df[df['CustomerID'].notna()]

# Create TotalSum column
df_fix["TotalSum"] = df_fix["Quantity"] * df_fix["UnitPrice"]
# Sample the dataset
df_fix = df.sample(10000, random_state = 42)

# Convert to show date only
from datetime import datetime
df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True).dt.strftime('%Y-%m-%d')

# Create date variable that records recency
import datetime
snapshot_date = max(df_fix.InvoiceDate)+str(datetime.timedelta(days=1))

# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
    'InvoiceNo': 'count',
    'TotalSum': 'sum'})

Please assist me

You should keep the datetime type when calculate

df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True)

# Create date variable that records recency
snapshot_date = max(df_fix.InvoiceDate)+pd.Timedelta(days=1)

# Aggregate data by each customer
customers = df_fix.groupby(['CustomerID']).agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days ,
    'InvoiceNo': 'count',
    'TotalSum': 'sum'})

Your snapshot_date is no longer a datetime object, after your converted it into a string with the following line:

snapshot_date = max(df_fix.InvoiceDate)+str(datetime.timedelta(days=1))

You may check the output of your snapshot_date with print(snapshot_date) to figure out how you can convert it back to a datetime object.

I have solved the problem by replacing this line of code:

df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce', utc=True).dt.strftime('%Y-%m-%d')

to this line of code:

df_fix["InvoiceDate"] = pd.to_datetime(df_fix["InvoiceDate"], errors='coerce')

The problem is now solved.

Thank you all of you for your help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM