在python pandas中创建新列时出现KeyError

Question

I am trying to create a new column in python pandas, and I keep getting an (unsteady) reoccurring KeyError. 我试图在python pandas中创建一个新列，并且不断出现（不稳定）重复的KeyError。 The section of the script is very straightforward so I am not sure what could be causing the error since none of the columns in the dataset have the same name. 脚本的这一部分非常简单，因此我不确定是什么引起了错误，因为数据集中的任何列都没有相同的名称。

My goal is to created a new column and append it to the dataframe that contains the new translations of the content of the column ticket_contents. 我的目标是创建一个新列，并将其附加到包含ticket_contents列内容的新翻译的数据框。 Here is a sample of the data; 这是数据样本；

25483   0   outstanding 0   Los-Angeles e-payment   delayed Ticket  1/7/19 7:54
39363   0   outstanding 0   Los-Angeles e-payment   delayed Ticket  1/7/19 7:54
83584   0   outstanding 6   Los-Angeles e-payment   delayed Ticket  1/7/19 7:54
34537   0   outstanding 7   Los-Angeles e-payment   lost    Ticket  1/7/19 7:53



colnames = ['id', 'ln_id', 'status', 
'number_outstanding', 'country', 'subject', 'ticket_contents', 'subtopic', 
'date']
test_data = pandas.read_csv(test_data, names = colnames, encoding 
= 'utf-8')
test_data = pandas.DataFrame(test_data)

translated_description = []

from_lang = 'tl'
to_lang = 'en-us'

def test_translation(contents):
    translator = Translator(from_lang = from_lang, to_lang = to_lang)
    translation = translator.translate(contents)
    translated_description.append(translation)
    #print(translated_description)


for contents, row in test_data.iterrows():
    contents = test_data.ticket_contents.iloc[contents -1]
    test_translation(contents)

test_data['translated_descriptions'].copy = translated_description

Here is the error output: 这是错误输出：

KeyError Traceback (most recent call last)
<ipython-input-70-55e39cf5e328> in <module>()
     16     test_translation(contents)
     17 
---> 18 test_data['translated_descriptions'].copy = translated_description
     19 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1969         # get column
   1970         if self.columns.is_unique:
-> 1971             return self._get_item_cache(key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1643         res = cache.get(item)
   1644         if res is None:
-> 1645             values = self._data.get(item)
   1646             res = self._box_item_values(item, values)
   1647             cache[item] = res

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_loc(self, key, method, tolerance)
   2442                 return self._engine.get_loc(key)
   2443             except KeyError:
-> 2444                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2445 
   2446         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()

KeyError: u'translated_descriptions'

Answer 1

I agree with the comments that you shouldn't be iterating through the dataframe. 我同意您不应遍历数据框的意见。 You should compute all of the values into a list, array, or Series, and assign them all at once. 您应该将所有值计算为列表，数组或系列，然后一次分配它们。

However your error comes from this line: 但是，您的错误来自此行：

test_data['translated_descriptions'].copy = translated_description

What it's doing is overwriting the the copy attribute/method of the test_data['translated_descriptions'] series. 它正在做的是覆盖test_data['translated_descriptions']系列的copy属性/方法。 Since that series doesn't exist yet, you get an error. 由于该系列尚不存在，因此会出现错误。

To create a new column with your sequence of values, I would do the following: 要使用您的值序列创建新列，我将执行以下操作：

test_data = test_data.assign(translated_descriptions=translated_description_values)

Answer 2

The error occurs at: 错误发生在：

test_data['translated_descriptions'].copy = translated_description

What does it actually contain: 它实际上包含什么：

test_data['translated_descriptions'].copy - is a reference to copy method of yet not existing column. test_data['translated_descriptions'].copy对尚未存在的列的copy方法的引用。
... = translated_description - you attempt to substitute a list to this reference. ... = translated_description您尝试将列表替换为此引用。

If you want to create a new column, write just: 如果要创建新列，请仅编写：

test_data['translated_descriptions'] = translated_description

Edit 编辑

If you want to get rid of the error mentioned in comment, then: 如果要摆脱注释中提到的错误，则：

Start from copying the Dataframe: df2 = test_data.copy() (invoke copy method of the whole DataFrame, not its column). 从复制Dataframe开始： df2 = test_data.copy() （调用整个 DataFrame的copy方法，而不是其列）。
Then use df2 - the new DataFrame. 然后使用df2 -新的数据帧。

And a couple of hints how to improve your program: 还有一些提示，说明如何改进程序：

Define translator outside of the translating function: 在翻译功能之外定义translator器：

translator = Translator(from_lang = from_lang, to_lang = to_lang)

Then define the translating function as: 然后将翻译功能定义为：

def test_translation(contents):
    return translator.translate(contents)

And then the new colun can be created as simply as: 然后可以简单地创建新的colun：

test_data['translated_descriptions'] = \
    test_data.ticket_contents.apply(test_translation)

without any intermediate list. 没有任何中间清单。

Look also at the following fragment of your program: 还要看一下程序的以下片段：

test_data = pandas.read_csv(test_data, names = colnames,
    encoding = 'utf-8')
test_data = pandas.DataFrame(test_data)

Note that: 注意：

The first instruction reads the DataFrame from CSV file and saves it under test_data variable. 第一条指令从CSV文件读取DataFrame并将其保存在test_data变量下。
Then you create a next DataFrame (actually a view of the existing DataFrame), and assign it to the same variable. 然后，创建下一个DataFrame（实际上是现有 DataFrame的视图），并将其分配给相同的变量。

The result is that: 结果是：

The previous DataFrame exists somewhere, but is now unreachable. previous DataFrame存在于某个地方，但现在无法访问。
You have access only to the view, created with the second instruction. 你只有到视图，与第二条指令创建的访问。
And this is why you get the mentioned error. 这就是为什么您遇到上述错误的原因。

Conclusion: Drop the second instruction. 结论：删除第二条指令。 It is enough to have one DataFrame. 一个 DataFrame就足够了。

在python pandas中创建新列时出现KeyError

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-02-26 19:04:58

解决方案2
0 2019-02-26 19:08:16

Edit 编辑

在python pandas中创建新列时出现KeyError

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-02-26 19:04:58

解决方案2 0 2019-02-26 19:08:16

Edit 编辑

解决方案1
0 已采纳 2019-02-26 19:04:58

解决方案2
0 2019-02-26 19:08:16