Market Basket Analysis in R with Hadoop

Question

I'm trying to find a fast way to do an affinity analysis on transactional market basket data with a few million number of rows.

What I've done so far:

Created an R Server on top of Spark & Hadoop on cloud (Azure HDInsight)
Loaded data on HDFS
Get started with RevoScaleR

However, I got stuck at the last step. As far as I understand, I won't be able to process the data with the use of a function that is not provided within RevoScaleR.

Here is the code for accessing the data on HDFS:

bigDataDirRoot <- "/basket" 
mySparkCluster <- RxSpark(consoleOutput=TRUE)
rxSetComputeContext(mySparkCluster)
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
inputFile <-file.path(bigDataDirRoot,"gunluk")

So my infputFile is a CSV in an Azure Blob already created at /basket/gunluk

gunluk_data <- RxTextData(file = inputFile,returnDataFrame = TRUE,fileSystem = hdfsFS)

After running this, I am able to see the data using head(gunluk_data).

How can I manage to use gunluk_data with arules package functions. Is this possible?

If not, is it possible to process a CSV file that is in HDFS using regular R packages (ie arules) ?

Answer 1

在规则中，您可以使用read.transactions从文件中读取数据，并使用write.PMML来写出规则/项目集。

Market Basket Analysis in R with Hadoop

Question

1 answers

solution1
0 2016-12-16 20:50:11

Market Basket Analysis in R with Hadoop

Question

1 answers

solution1 0 2016-12-16 20:50:11

solution1
0 2016-12-16 20:50:11