简体   繁体   中英

Data Manipulation in R for Apriori

I have a part of the data-set as shown below in the form of csv,the number of rows and columns are more than what is shown.I want to implement apriori on this data-set,Say I have this:-

    Maths Science C++ Java DC
[1]    75   44      55  56  88
[2]    56   88      54  78  44

the original dataset has total columns(representing subjects)=30 and serial number(representing students)=24,

DATASET: link

I want to covert this dataset in the form shown below:-

[1] {Maths,DC}
[2] {Science,Java}

ie A list of list(I think this is what it is called) containing the colnames.A list for a student shows in which subject he/she scored more than or equal to 75 marks,rest of the subjects are dropped(The only condition of the problem)

eq:- first student scored 75+ marks in Dc and Maths and so his list includes only dc and maths.

I am sorry for posting this,but I searched a lot on stack,and found a few of the working suggestions ,but couldn't reach the final goal. My goal is to get a form like this:-

[9834] {semi-finished bread,      
        bottled water,            
        soda,                     
        bottled beer}             
[9835] {chicken,                  
        tropical fruit,           
        other vegetables,         
        vinegar,                  
        shopping bags}  

As given in :-

library(arules)
inspect(Groceries)

OR I WILL APPRECIATE IF ANYONE CAN SUGGEST A WAY TO REPRESENT THE DATA IN OTHER FORM WHICH APRIORI CAN UNDERSTAND,BUT IT SHOULD FOLLOW THE NECESSARY CONDITIONS AS STATED.

*(sorry for the long post,I hope this conversion of my dataset in this format may help me study the pattern in student-subject dataset,thnx a ton for all the help)

library(plyr)
library(arules)
df <- read.table(text = 
"   75   44      55  56  88
    56   88      54  78  44")
names(df) <- c("Maths", "Science", "C++", "Java", "DC")
transactions <- as(alply(df, 1, function(x) names(x)[x >= 75]), "transactions")
inspect(transactions)

#     items          transactionID
# [1] {DC,Maths}     1            
# [2] {Java,Science} 2            

Edit: It works with your example dataset, too:

library(plyr)
library(arules)
df <- read.csv(file = url("https://drive.google.com/uc?export=download&id=0B3kdblyHw4qLR0dpT24xWUZGcGs"))
transactions <- as(alply(df, 1, function(x) names(x)[x >= 75]), "transactions")
inspect(transactions)

#      items                              transactionID
# [1]  {CD,CG,CN,DA,Data.Struc}           1            
# [2]  {CD,CG,CO,ML,OS}                   2            
# [3]  {CN,Data.Struc,DC,DM,DMS}          3            
# [4]  {CHE,DD,DM,EC,EE}                  4            
# [5]  {CHE,CN,MATHS,PHY}                 5            
# [6]  {Data.Science,DM,DMS,ML,OS}        6            
# [7]  {CD,DA,Data.Struc,EC,MATHS}        7            
# [8]  {CG,CHE,CN,CO,OS}                  8            
# [9]  {CN,CO,Data.Science,DC,DMS}        9            
# [10] {DC,DD,EC,EE,PHY}                  10           
# [11] {CHE,DD,DMS,MATHS,PHY}             11           
# [12] {CN,Data.Science,DM,MATHS,ML}      12           
# [13] {CD,CG,DA,Data.Science,Data.Struc} 13           
# [14] {CG,CO,EE,MATHS,OS}                14           
# [15] {CN,CO,DC,DMS,PHY}                 15           
# [16] {CN,CO,DD,EC,EE}                   16           
# [17] {CHE,DA,EE,MATHS,PHY}              17           
# [18] {Data.Science,DD,DM,ML,PHY}        18           
# [19] {CD,CO,DA,Data.Struc,DC}           19           
# [20] {CG,CO,DD,DM,OS}                   20           
# [21] {CG,CN,DA,DC,DMS}                  21           
# [22] {DD,EC,EE,ML,OS}                   22           
# [23] {CHE,CN,Data.Struc,MATHS,PHY}      23           
# [24] {CG,Data.Science,DM,EE,ML}         24

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM