There are few columns which consist of test values such as mode of payment used and what type of discount has been used. Few of the entries I am pasting to get an idea.
Mode_of_payment discount_used
ICICI CREDIT CARD FGShoppingFest
Payback FGShoppingFest,T24Club
CASH FGShoppingFest,BBProfitClub
CASH FGShoppingFest,Payback
ICICI CREDIT CARD FGShoppingFest
CreditNote FGShoppingFest
CASH FGShoppingFest,Payback
CASH FGShoppingFest,T24Club,Payback
Cash Back FGShoppingFest
Cash Back FGShoppingFest,T24Club,Payback
Cash Back FGShoppingFest,T24Club
CASH FGShoppingFest,Payback
Information about these columns - What is the mode of payment used and in the discount used column - there can be single discount or multiple discounts applied on a product.
I want to extract information out of these columns so that clustering can be applied to it. How to convert it into numeric data?
Don't. Choose an approach that doesn't require numeric variables if your data is not numeric.
While you can encode them using dummy variables, most clustering algorithms such as k-means expect continuous variables. You can't just convert a symbolic value into a meaningful continuous variable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.