简体   繁体   中英

Surprise NMF throws ZeroDivisionError: float division

I'm trying to do a basic recommendation system. I use Surprise 's NMF model for this.

Here is my dataset just before starting to work with NMF:

    store_id    item_id     quantity
0   62693933    912003029   3.000
1   62693933    912003034   4.000
2   62693933    913003004   1.000
3   62693933    913050001   2.024
4   62693933    913163001   11.838
...     ...     ...     ...
353843  101931000   4140870025  9.000
353844  101931000   19136680005     3.000
353845  101931000   50012447358     3.000
353846  101931000   51010204669     3.000
353847  101931000   51010208567     3.000

353848 rows × 3 columns

After this, I run the code below to preprare this dataset to train a model:

min_quantity = df.quantity.min()
max_quantity = df.quantity.max()

reader = surprise.Reader(
    rating_scale=(min_quantity, max_quantity)

surprise_df = surprise.Dataset.load_from_df(df, reader)

surprise_trainset = surprise_df.build_full_trainset()

After these steps, this code below throws an error:

model = NMF().fit(surprise_trainset)
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-73-4f2929f79206> in <module>
----> 1 model = NMF().fit(surprise_trainset)

/usr/local/lib/python3.6/dist-packages/surprise/prediction_algorithms/matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.NMF.fit()

/usr/local/lib/python3.6/dist-packages/surprise/prediction_algorithms/matrix_factorization.pyx in surprise.prediction_algorithms.matrix_factorization.NMF.sgd()

ZeroDivisionError: float division

This system were working fine. I assume the problem is with the dataset. But I couldn't figure out what causes this. I checked out the null, zero values etc. None of the values are null and there are zeros only in the quantity (rank) column.

I would be glad if anyone have an idea what could be causing this error. I can provide more info about the dataset if you need.

I don't know if this is right but here is a sample of the data for you to use. You can save it as json and read it with pandas:


I have found the problem. If all quantities are zero related to a item_id or store_id, NMF throws a ZeroDivisionError while fitting the model.

So I check unique values for all item ids and store ids. If all the quantities for an individual item or store is zero, I simply discard them

I don't know if this is the right or best solution but, it works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM