I'm trying to find a fast way to do an affinity analysis on transactional market basket data with a few million number of rows.
What I've done so far:
However, I got stuck at the last step. As far as I understand, I won't be able to process the data with the use of a function that is not provided within RevoScaleR.
Here is the code for accessing the data on HDFS:
bigDataDirRoot <- "/basket"
mySparkCluster <- RxSpark(consoleOutput=TRUE)
rxSetComputeContext(mySparkCluster)
hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)
inputFile <-file.path(bigDataDirRoot,"gunluk")
So my infputFile is a CSV in an Azure Blob already created at /basket/gunluk
gunluk_data <- RxTextData(file = inputFile,returnDataFrame = TRUE,fileSystem = hdfsFS)
After running this, I am able to see the data using head(gunluk_data).
How can I manage to use gunluk_data with arules package functions. Is this possible?
If not, is it possible to process a CSV file that is in HDFS using regular R packages (ie arules) ?
在规则中,您可以使用read.transactions
从文件中读取数据,并使用write.PMML
来写出规则/项目集。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.