TypeError：不可用類型：來自pandas'pd.factorize（）的'list'

Question

所以我正在學習關於多類文本分類的教程，並且我正在嘗試找到一種方法來預測一個監督方法中的食譜標簽，其中JSON文件具有以下格式的食譜：

{
"title": "Turtle Cheesecake",
"summary": "Cheesecake is a staple at the Market, but it’s different nearly every day because we vary the toppings, crusts, and flavorings. Cookie crusts are particularly good with cheesecakes. If you prefer your cheesecake plain, just serve it without the topping",
"ingr": [
  "1½ cups graham cracker crumbs",
  "½ cup finely chopped pecans (pulse in a food processor several times)",
  "6 tablespoons ( ¾ stick) unsalted butter, melted",
  "1½ pounds cream cheese, softened",
  "¾ cup sugar",
  "2 tablespoons all purpose flour",
  "3 large eggs",
  "1large egg yolk",
  "½ cup heavy cream",
  "2 teaspoons pure vanilla extract",
  "1 cup sugar",
  "1 cup heavy cream",
  "½ teaspoon pure vanilla extract",
  "½ cup coarsely chopped pecans, toasted",
  "2 ounces semisweet chocolate, melted"
],
"prep": "To Make the Crust:\n\n\n\n Grease a 9-inch springform pan. Wrap the outside of the pan, including the bottom, with a large square of aluminum foil. Set aside.\n\n\n\..."
"tag": [
  "Moderate",
  "Casual Dinner Party",
  "Family Get-together",
  "Formal Dinner Party",
  "dessert",
  "dinner",
  "cake",
  "cheesecake",
  "dessert"
}

這是我正在運行的代碼導致TypeError：

import pandas as pd

df = pd.read_json('tagged-sample.json') 
######################### Data Exploration #######################

from io import StringIO

col = ['tag', 'summary']
df = df[col]
df = df[pd.notnull(df['summary'])]

df.columns = ['tag', 'summary']

df['category_id'] = df['tag'].factorize()[0]

我能做些什么才能在json中的'tag'類別上使用pandas.factorize。 本教程在csv文件上執行此操作，這可能會有所不同。 這是錯誤：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-d471748e6818> in <module>()
     12 df.columns = ['tag', 'summary']
     13 
---> 14 df['category_id'] = df['tag'].factorize()[0]
     15 
     16 #[['tag', 'category_id']].sort_values('category_id')

~\Anaconda3\lib\site-packages\pandas\core\base.py in factorize(self, sort, na_sentinel)
   1155     @Appender(algorithms._shared_docs['factorize'])
   1156     def factorize(self, sort=False, na_sentinel=-1):
-> 1157         return algorithms.factorize(self, sort=sort, na_sentinel=na_sentinel)
   1158 
   1159     _shared_docs['searchsorted'] = (

~\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    175                 else:
    176                     kwargs[new_arg_name] = new_arg_value
--> 177             return func(*args, **kwargs)
    178         return wrapper
    179     return _deprecate_kwarg

~\Anaconda3\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, order, na_sentinel, size_hint)
    628                                            na_sentinel=na_sentinel,
    629                                            size_hint=size_hint,
--> 630                                            na_value=na_value)
    631 
    632     if sort and len(uniques) > 0:

~\Anaconda3\lib\site-packages\pandas\core\algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
    474     uniques = vec_klass()
    475     labels = table.get_labels(values, uniques, 0, na_sentinel,
--> 476                               na_value=na_value)
    477 
    478     labels = _ensure_platform_int(labels)

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_labels()

TypeError: unhashable type: 'list'

Answer 1

如果您調用pd.factorize(s) ，其中s是Pandas系列，則Series的每個元素都需要是可清除的。

例如：

>>> s = pd.Series([1, 2, [3, 4, 5]])
>>> s
0            1
1            2
2    [3, 4, 5]
dtype: object
>>> pd.factorize(s)  # this will raise

>>> pd.factorize(s.drop(2))  # this is okay
(array([0, 1]), Int64Index([1, 2], dtype='int64'))

解決此問題的一種方法（不確定您的最終目標是什么）是將列表元素轉換為可清除的元組：

>>> s.apply(lambda x: tuple(x) if isinstance(x, list) else x)
0            1
1            2
2    (3, 4, 5)
dtype: object

>>> pd.factorize(s.apply(lambda x: tuple(x) if isinstance(x, list) else x))
(array([0, 1, 2]), Index([1, 2, (3, 4, 5)], dtype='object'))

TypeError：不可用類型：來自pandas'pd.factorize（）的'list'

問題描述

1 個解決方案

解決方案1
3 2018-09-07 17:06:22

TypeError：不可用類型：來自pandas&#39;pd.factorize（）的&#39;list&#39;

問題描述

1 個解決方案

解決方案1 3 2018-09-07 17:06:22

TypeError：不可用類型：來自pandas'pd.factorize（）的'list'

解決方案1
3 2018-09-07 17:06:22