R中的规则序列挖掘

Question

I am looking to use the arulesSequences package in R. However, I have no idea as to how to coerce my data frame into an object that can leverage this package.我希望在 R 中使用arulesSequences包。但是，我不知道如何将我的数据帧强制转换为可以利用该包的对象。

Here is a toy dataset that replicates my data structure:这是一个复制我的数据结构的玩具数据集：

ids <- c(rep("X", 5), rep("Y", 5), rep("Z", 5))
seq <- rep(1:5,3)
val <- sample(LETTERS, 15, replace=T)
df <- data.frame(ids, seq, val)
df

   ids seq val
1    X   1   T
2    X   2   H
3    X   3   V
4    X   4   A
5    X   5   X
6    Y   1   D
7    Y   2   B
8    Y   3   A
9    Y   4   D
10   Y   5   P
11   Z   1   Q
12   Z   2   R
13   Z   3   W
14   Z   4   W
15   Z   5   P

Any help will be greatly appreciated.任何帮助将不胜感激。

Answer 1

Factor data frame:因子数据框：

df_fact = data.frame(lapply(df,as.factor))

Build "transaction" data:构建“交易”数据：

df_trans = as(df_fact, 'transactions')

Test it:测试一下：

itemFrequencyPlot(df_trans, support = 0.1, cex.names=0.8)

Answer 2

By using read_baskets:通过使用 read_baskets：

    read_baskets(con  = filePath.txt,
      sep = " ",
      info = c("sequenceID","eventID","SIZE"))

Which in practice means exporting the created data to a text-file and re-importing it through read_baskets.这实际上意味着将创建的数据导出到文本文件并通过 read_baskets 重新导入。 The info argument defines the first columns containing the sequenceID, eventID and an optional eventsize column. info 参数定义包含 sequenceID、eventID 和可选的 eventsize 列的第一列。

Answer 3

It worked for me add an essentially "order" column that lists a order ranking rather than a time value.它对我有用，添加了一个本质上是“订单”列，其中列出了订单排名而不是时间值。 You just have to be very specific in the naming convention.您只需要在命名约定中非常具体。 Try and name the "group" or "ordered basket #" variable sequenceID, and call the ranking or ordering eventID.尝试命名“group”或“ordered bag#”变量sequenceID，并调用排序或排序eventID。

Another thing that helped me (and had me scratching my head for a long time) was that read_baskets() seemed to need me to specify另一件帮助我（并让我挠了很长时间）的事情是 read_baskets() 似乎需要我指定

read_baskets(con  = filePath.txt, sep = " ", info = c("sequenceID","eventID","SIZE"))

Even though the help function makes the c() details seem like an optional header, it is not.尽管帮助函数使 c() 细节看起来像一个可选的标题，但它不是。 I seemed to need to remove the header from my file and specify it in the read_baskets() command, or I'd run into problems.我似乎需要从我的文件中删除标题并在 read_baskets() 命令中指定它，否则我会遇到问题。

Answer 4

Instead of using the data frame, what worked best for me was to split the data into individual and than convert to transactions.不使用数据框，对我来说最有效的是将数据拆分为单个数据，而不是转换为事务。

 eh$cost<-split(eh$cost$val ,eh$cost$id)
 eh$cost1<- as(eh$cost,"transactions")

Answer 5

You have to first change your items into transactions so just coerce the column of items您必须首先将您的项目更改为交易，因此只需强制项目列
trans = as(df[,'val'], "transactions")

then you can add the information to your transactions object然后您可以将信息添加到您的交易对象

trans@itemsetInfo$transactionID = NULL trans@itemsetInfo$sequenceID = df$ids trans@itemsetInfo$eventID = df$seq

Answer 6

df <- df %>% arrange(id,seq) %>% summarise(size=n(), items=list(val))

then write to txt ( this tutorial also suggest that after a data wrangling write it then read it with read_basket function)然后写入txt（本教程还建议在数据read_basket后写入然后使用read_basket函数读取它）

df$items <- as.character(df$items)
write.table(df, file = "trans.txt", sep = " ", row.names = FALSE, col.names = FALSE)

read the file and check it读取文件并检查它

x <- read_baskets("trans.txt", sep = " ", info = c("sequenceID","eventID","SIZE"))
as(x, "data.frame")

R中的规则序列挖掘

问题描述

6 个解决方案

解决方案1
1 2012-10-23 02:03:36

解决方案2
1 2015-03-18 09:25:23

解决方案3
1 2016-01-14 12:23:25

解决方案4
0 2015-08-27 17:53:17

解决方案5
0 2017-06-21 13:40:43

解决方案6
0 2021-04-12 12:48:02

R中的规则序列挖掘

问题描述

6 个解决方案

解决方案1 1 2012-10-23 02:03:36

解决方案2 1 2015-03-18 09:25:23

解决方案3 1 2016-01-14 12:23:25

解决方案4 0 2015-08-27 17:53:17

解决方案5 0 2017-06-21 13:40:43

解决方案6 0 2021-04-12 12:48:02

解决方案1
1 2012-10-23 02:03:36

解决方案2
1 2015-03-18 09:25:23

解决方案3
1 2016-01-14 12:23:25

解决方案4
0 2015-08-27 17:53:17

解决方案5
0 2017-06-21 13:40:43

解决方案6
0 2021-04-12 12:48:02