简体   繁体   English

pyspark--FPGrowth:转换如何处理看不见的交易?

[英]pyspark--FPGrowth: how does transform work on unseen transactions?

I am using pyspark.ml.fpm.FPGrowth in Spark 2.4 and I have a question about how precisely transform works on a transactions which are new.我在 Spark 2.4 中使用 pyspark.ml.fpm.FPGrowth,我有一个关于如何精确转换新事务的问题。

My understanding is that model.transform will take each transaction X and find all Y such that Conf(X-->Y) > minConfidence.我的理解是 model.transform 将获取每个事务 X 并找到所有 Y,使得 Conf(X-->Y) > minConfidence。 It will then return the list of such Y ordered by confidence.然后它将返回按置信度排序的此类 Y 的列表。

However suppose there is no transaction which contains X, so Conf(X-->Y) is undefined for all Y, I am unsure how the algorithm will transform this transaction.但是,假设没有包含 X 的事务,因此所有 Y 都未定义 Conf(X-->Y),我不确定该算法将如何转换此事务。

This is a simple set of transactions taken from the docs:这是从文档中获取的一组简单的交易:

DF = spark.createDataFrame([
    (0, [1, 2, 5]),
    (1, [1, 2, 3, 5]),
    (2, [1, 4])
], ["id", "items"])

fpGrowth = FPGrowth(itemsCol="items", minSupport=0, minConfidence=0)
model = fpGrowth.fit(DF)

Then we supply a simple transaction as test data:然后我们提供一个简单的交易作为测试数据:

test_DF = spark.createDataFrame([
    (0, [4,5])
], ["id", "items"])
test_DF = spark.createDataFrame(baskets, schema=schema)
model.transform(test_DF).show()

+---+------+----------+
|num| items|prediction|
+---+------+----------+
|  1|[4, 5]| [1, 3, 2]|
+---+------+----------+

Does anyone know how the prediction [1,3,2] was generated?有谁知道预测 [1,3,2] 是如何生成的?

I think FPGrowthModel.transform applies the rules mined by FPGrowth on the transactions, so when ever it finds an itemset X in a transaction and at the same time we have a rule that says (X=>Y) then it suggests the item Y in prediction column for this transaction, but the question know I noticed that in the case we have a transaction that contains X and Y it returns [ ] in prediction column unless there is a rule that says X & Y => Z in this case it will suggest Z instead.我认为 FPGrowthModel.transform 将 FPGrowth 挖掘的规则应用于事务,所以当它在事务中找到项集 X 并且同时我们有一个规则说 (X=>Y) 然后它建议项 Y此交易的预测列,但问题知道我注意到,在我们有一个包含 X 和 Y 的交易的情况下,它在预测列中返回 [ ] 除非有一条规则说 X & Y => Z 在这种情况下它会建议Z代替。 So that makes it hard to evaluate the model with accuracy metric :(所以这使得很难用准确度指标来评估模型:(

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM