简体   繁体   中英

How to convert Object column into numeric for cluster analysis in Python?

There are few columns which consist of test values such as mode of payment used and what type of discount has been used. Few of the entries I am pasting to get an idea.

  Mode_of_payment           discount_used
ICICI CREDIT CARD           FGShoppingFest
Payback             FGShoppingFest,T24Club
CASH                FGShoppingFest,BBProfitClub
CASH                FGShoppingFest,Payback
ICICI CREDIT CARD               FGShoppingFest
CreditNote              FGShoppingFest
CASH                FGShoppingFest,Payback
CASH                FGShoppingFest,T24Club,Payback
Cash Back               FGShoppingFest
Cash Back               FGShoppingFest,T24Club,Payback
Cash Back               FGShoppingFest,T24Club
CASH                FGShoppingFest,Payback

Information about these columns - What is the mode of payment used and in the discount used column - there can be single discount or multiple discounts applied on a product.

I want to extract information out of these columns so that clustering can be applied to it. How to convert it into numeric data?

Don't. Choose an approach that doesn't require numeric variables if your data is not numeric.

While you can encode them using dummy variables, most clustering algorithms such as k-means expect continuous variables. You can't just convert a symbolic value into a meaningful continuous variable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM