简体   繁体   English

在python pandas中创建新列时出现KeyError

[英]KeyError when creating new column in python pandas

I am trying to create a new column in python pandas, and I keep getting an (unsteady) reoccurring KeyError. 我试图在python pandas中创建一个新列,并且不断出现(不稳定)重复的KeyError。 The section of the script is very straightforward so I am not sure what could be causing the error since none of the columns in the dataset have the same name. 脚本的这一部分非常简单,因此我不确定是什么引起了错误,因为数据集中的任何列都没有相同的名称。

My goal is to created a new column and append it to the dataframe that contains the new translations of the content of the column ticket_contents. 我的目标是创建一个新列,并将其附加到包含ticket_contents列内容的新翻译的数据框。 Here is a sample of the data; 这是数据样本;

25483   0   outstanding 0   Los-Angeles e-payment   delayed Ticket  1/7/19 7:54
39363   0   outstanding 0   Los-Angeles e-payment   delayed Ticket  1/7/19 7:54
83584   0   outstanding 6   Los-Angeles e-payment   delayed Ticket  1/7/19 7:54
34537   0   outstanding 7   Los-Angeles e-payment   lost    Ticket  1/7/19 7:53



colnames = ['id', 'ln_id', 'status', 
'number_outstanding', 'country', 'subject', 'ticket_contents', 'subtopic', 
'date']
test_data = pandas.read_csv(test_data, names = colnames, encoding 
= 'utf-8')
test_data = pandas.DataFrame(test_data)

translated_description = []

from_lang = 'tl'
to_lang = 'en-us'

def test_translation(contents):
    translator = Translator(from_lang = from_lang, to_lang = to_lang)
    translation = translator.translate(contents)
    translated_description.append(translation)
    #print(translated_description)


for contents, row in test_data.iterrows():
    contents = test_data.ticket_contents.iloc[contents -1]
    test_translation(contents)

test_data['translated_descriptions'].copy = translated_description

Here is the error output: 这是错误输出:

KeyError Traceback (most recent call last)
<ipython-input-70-55e39cf5e328> in <module>()
     16     test_translation(contents)
     17 
---> 18 test_data['translated_descriptions'].copy = translated_description
     19 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1962             return self._getitem_multilevel(key)
   1963         else:
-> 1964             return self._getitem_column(key)
   1965 
   1966     def _getitem_column(self, key):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1969         # get column
   1970         if self.columns.is_unique:
-> 1971             return self._get_item_cache(key)
   1972 
   1973         # duplicate columns & possible reduce dimensionality

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1643         res = cache.get(item)
   1644         if res is None:
-> 1645             values = self._data.get(item)
   1646             res = self._box_item_values(item, values)
   1647             cache[item] = res

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_loc(self, key, method, tolerance)
   2442                 return self._engine.get_loc(key)
   2443             except KeyError:
-> 2444                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2445 
   2446         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()

KeyError: u'translated_descriptions'

I agree with the comments that you shouldn't be iterating through the dataframe. 我同意您不应遍历数据框的意见。 You should compute all of the values into a list, array, or Series, and assign them all at once. 您应该将所有值计算为列表,数组或系列,然后一次分配它们。

However your error comes from this line: 但是,您的错误来自此行:

test_data['translated_descriptions'].copy = translated_description

What it's doing is overwriting the the copy attribute/method of the test_data['translated_descriptions'] series. 它正在做的是覆盖test_data['translated_descriptions']系列的copy属性/方法。 Since that series doesn't exist yet, you get an error. 由于该系列尚不存在,因此会出现错误。

To create a new column with your sequence of values, I would do the following: 要使用您的值序列创建新列,我将执行以下操作:

test_data = test_data.assign(translated_descriptions=translated_description_values)

The error occurs at: 错误发生在:

test_data['translated_descriptions'].copy = translated_description

What does it actually contain: 它实际上包含什么:

  • test_data['translated_descriptions'].copy - is a reference to copy method of yet not existing column. test_data['translated_descriptions'].copy对尚未存在的列的copy方法的引用。
  • ... = translated_description - you attempt to substitute a list to this reference. ... = translated_description您尝试将列表替换为此引用。

If you want to create a new column, write just: 如果要创建新列,请仅编写:

test_data['translated_descriptions'] = translated_description

Edit 编辑

If you want to get rid of the error mentioned in comment, then: 如果要摆脱注释中提到的错误,则:

  • Start from copying the Dataframe: df2 = test_data.copy() (invoke copy method of the whole DataFrame, not its column). 从复制Dataframe开始: df2 = test_data.copy() (调用整个 DataFrame的copy方法,而不是其列)。
  • Then use df2 - the new DataFrame. 然后使用df2 -新的数据帧。

And a couple of hints how to improve your program: 还有一些提示,说明如何改进程序:

Define translator outside of the translating function: 在翻译功能之外定义translator器:

translator = Translator(from_lang = from_lang, to_lang = to_lang)

Then define the translating function as: 然后将翻译功能定义为:

def test_translation(contents):
    return translator.translate(contents)

And then the new colun can be created as simply as: 然后可以简单地创建新的colun:

test_data['translated_descriptions'] = \
    test_data.ticket_contents.apply(test_translation)

without any intermediate list. 没有任何中间清单。

Look also at the following fragment of your program: 还要看一下程序的以下片段:

test_data = pandas.read_csv(test_data, names = colnames,
    encoding = 'utf-8')
test_data = pandas.DataFrame(test_data)

Note that: 注意:

  • The first instruction reads the DataFrame from CSV file and saves it under test_data variable. 第一条指令从CSV文件读取DataFrame并将其保存在test_data变量下。
  • Then you create a next DataFrame (actually a view of the existing DataFrame), and assign it to the same variable. 然后,创建下一个DataFrame(实际上是现有 DataFrame的视图),并将其分配给相同的变量。

The result is that: 结果是:

  • The previous DataFrame exists somewhere, but is now unreachable. previous DataFrame存在于某个地方,但现在无法访问。
  • You have access only to the view, created with the second instruction. 只有到视图,与第二条指令创建的访问。
  • And this is why you get the mentioned error. 这就是为什么您遇到上述错误的原因。

Conclusion: Drop the second instruction. 结论:删除第二条指令。 It is enough to have one DataFrame. 一个 DataFrame就足够了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM