简体   繁体   中英

Association rule mining using arules package in R

I am trying to find association rules using arules package in R. I am using a csv file to create the transaction object. I am getting an incorrect item set. This is what the data looks like

137,lidocaine
138,pregabalin
139,esomeprazole,nadolol,atorvastatin
140,hydromorphone
141,ondansetron,enoxaparin,metoclopramide
142,fluticasone
143,trandolapril,amlodipine,fluticasone,esomeprazole
144,meloxicam
145,lidocaine
146,atorvastatin
147,fluticasone
here is the R code I am using
library("arules")
txn <- read.transactions("basket.csv", rm.duplicates= TRUE,format="basket",sep=",",cols =1);
txn@itemInfo
The item list I am getting has repeated items
labels
1       amlodipine
2    atorvastatin"
3       enoxaparin
4     esomeprazole
5    esomeprazole"
6      fluticasone
7     fluticasone"
8   hydromorphone"
9       lidocaine"
10      meloxicam"
11 metoclopramide"
12         nadolol
13     ondansetron
14     pregabalin"
15    trandolapril

If you look at item 4&5 they are same but are being treated different due to quotation marks, similar case for item 6&7.

Is there a way to resolve this or any reason why is this happening?

I'm not familiar with R tool, but I'm an AI student and I know a little about association rules.

I think it is related to your data file. if you note your item list you will see that each item which is at the end of line in the data file, has appeared with a quotation mark in the item list, and vice versa.

so the reason for those items which appeared two times is that they have appeared one time at the end of line and another time in the middle of line in the date file.

repeat that I'm not familiar with R tool, but I think a simple correction such as adding one space at the end of all lines of data file will solve this problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM