简体   繁体   English

使用R中的arules包进行关联规则挖掘

[英]Association rule mining using arules package in R

I am trying to find association rules using arules package in R. I am using a csv file to create the transaction object. 我正在尝试使用R中的arules包查找关联规则。我正在使用csv文件创建事务对象。 I am getting an incorrect item set. 我收到的商品集不正确。 This is what the data looks like 这就是数据的样子

137,lidocaine
138,pregabalin
139,esomeprazole,nadolol,atorvastatin
140,hydromorphone
141,ondansetron,enoxaparin,metoclopramide
142,fluticasone
143,trandolapril,amlodipine,fluticasone,esomeprazole
144,meloxicam
145,lidocaine
146,atorvastatin
147,fluticasone
here is the R code I am using
library("arules")
txn <- read.transactions("basket.csv", rm.duplicates= TRUE,format="basket",sep=",",cols =1);
txn@itemInfo
The item list I am getting has repeated items
labels
1       amlodipine
2    atorvastatin"
3       enoxaparin
4     esomeprazole
5    esomeprazole"
6      fluticasone
7     fluticasone"
8   hydromorphone"
9       lidocaine"
10      meloxicam"
11 metoclopramide"
12         nadolol
13     ondansetron
14     pregabalin"
15    trandolapril

If you look at item 4&5 they are same but are being treated different due to quotation marks, similar case for item 6&7. 如果您查看项目4&5,它们是相同的,但是由于引号引起的区别对待,项目6&7的情况与此类似。

Is there a way to resolve this or any reason why is this happening? 是否有解决此问题的方法或任何原因?

I'm not familiar with R tool, but I'm an AI student and I know a little about association rules. 我不熟悉R工具,但是我是一名AI学生,并且对关联规则了解得很少。

I think it is related to your data file. 我认为这与您的数据文件有关。 if you note your item list you will see that each item which is at the end of line in the data file, has appeared with a quotation mark in the item list, and vice versa. 如果您记下项目列表,则会看到数据文件中行末的每个项目在项目列表中都带有引号,反之亦然。

so the reason for those items which appeared two times is that they have appeared one time at the end of line and another time in the middle of line in the date file. 因此,这些项目出现两次的原因是,它们在日期文件的行尾出现了一次,在行中间出现了另一次。

repeat that I'm not familiar with R tool, but I think a simple correction such as adding one space at the end of all lines of data file will solve this problem. 再说一遍,我对R工具不熟悉,但是我认为一个简单的更正,例如在数据文件所有行的末尾添加一个空格将解决此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM