简体   繁体   中英

using arulesSequences package : Error in makebin(data, file) : 'sid' invalid

I am using arulesSequences package in R. The documentation is too little for the type of data that read_baskets function receives. I guess data should be in text (.txt) format. Column names are: "sequenceID", "eventID", "SIZE" and "items". My data has about 200,000 rows and looks like following in z.txt file:

1,1364,3,{12,17,19}
1,1130,4,{14,17,21,23}
1,1173,3,{19,23,9}
1,98,5,{14,15,2,21,5}
2,1878,4,{1,10,14,3}
2,1878,13,{1,12,14,15,16,17,18,19,2,21,24,25,5}
2,1878,1,{2}

I tried to use:

x <- read_baskets("z.txt", sep = ",",info =c("sequenceID","eventID","SIZE"))
s <- cspade(x,parameter = list(support = 0.001),control = list(verbose = 
TRUE),tmpdir = tempdir())

but I get this error :

Error in makebin(data, file) : 'sid' invalid

The combination of sequenceID and eventID must be unique.

Otherwise you'll get one of these errors:

  • Error in makebin(data, file) : 'sid' invalid
  • Error in makebin(data, file) : 'eid' invalid

This implies further that the items in your .txt file (per sequenceID, eventID combination) must be in the same row and (possibly) be separated with the same separator as the rest of the .txt file. Therefore, the item column should be the last column.

Hope this helps!

Ok I found the problem, and I'm posting it in case that some one has the same problem. The problem is both SequenceID and eventID (first and second columns must be ordered blockwise. package mentions this point, but I only ordered the first column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM