简体   繁体   English

熊猫数据框索引在索引数据框的子集时引起问题。 如何删除索引,或防止发生错误?

[英]Pandas dataframe index causing problems when indexing subset of dataframe. How do I remove the indexes, or prevent the error from occurring?

I have a dataframe x1 . 我有一个数据框x1 I made a subset of the dataframe, x1_sub , where I need to use a for loop to index its items. 我制作了数据x1_sub的子集x1_sub ,在这里我需要使用for循环来索引其项。 But because the subset retains the indexing of the original pandas dataframe, it has its rows like so: 但是由于子集保留了原始熊猫数据帧的索引,因此其行如下:

x1_sub['words']

1         investment
2               fund
4            company
7              claim
9           customer
20              easy
...              ...

So, when I do something like this to index the rows of x1_sub serially: 所以,当我做这样的事情来按x1_sub索引x1_sub的行时:

for i in range(len(x1)):
    for j in range(len(x1_sub)):
        if (x1['word'][i]==x1_sub['word'][j]):
            print(i, j)

it gives the following error: 它给出以下错误:

KeyError                                  Traceback (most recent call last)
<ipython-input-48-e3c9806732a6> in <module>()
      3 for i in range(len(x1)):
      4     for j in range(len(x1_sub)):
----> 5         if (x1['word'][i]==x1_sub['word'][j]):
      6             print(i, j)
      7 

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    621         key = com._apply_if_callable(key, self)
    622         try:
--> 623             result = self.index.get_value(self, key)
    624 
    625             if not is_scalar(result):

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   2558         try:
   2559             return self._engine.get_value(s, k,
-> 2560                                           tz=getattr(series.dtype, 'tz', None))
   2561         except KeyError as e1:
   2562             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

EDIT: Some example data: 编辑:一些示例数据:

The following data is saved in a csv file named example.csv : 以下数据保存在名为example.csv的csv文件中:

word    score
service 1
customer    4
agent   3
product 6
easy    2
claim   2
fast    1
financial   5
information 1
benefit 4
company 3
helpful 6
time    2
future  2
policy  1
health  5
life    1
fund    4
complicated 3
investment  6
join    2
payment 2
premium 1
excellent   5
experience  1
family  4
nice    3
proces  6
satisfactory    2

And the code is this: 代码是这样的:

import pandas as pd

x1 = pd.read_csv(r'C:\Users\h473\Documents\Indonesia_verbatims W1 2018\Insurance Data X3\example.csv')

x1_sub = x1[x1['score']<=2]

for i in range(len(x1)):
    for j in range(len(x1_sub)):
        if (x1['word'][i]==x1_sub['word'][j]):
            print(i, j)

And this is the output: 这是输出:

0 0
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-63-08d55a712c99> in <module>()
      7 for i in range(len(x1)):
      8     for j in range(len(x1_sub)):
----> 9         if (x1['word'][i]==x1_sub['word'][j]):
     10             print(i, j)

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    621         key = com._apply_if_callable(key, self)
    622         try:
--> 623             result = self.index.get_value(self, key)
    624 
    625             if not is_scalar(result):

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   2558         try:
   2559             return self._engine.get_value(s, k,
-> 2560                                           tz=getattr(series.dtype, 'tz', None))
   2561         except KeyError as e1:
   2562             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1

EDIT 2: Also, if x1_sub is a list, then the error is different: 编辑2:此外,如果x1_sub是一个列表,则错误是不同的:

import pandas as pd

x1 = pd.read_csv(r'C:\Users\h473\Documents\Indonesia_verbatims W1 2018\Insurance Data X3\example.csv')

#x1_sub = x1[x1['score']<=2]
x1_sub = ['service', 'claim', 'health', 'fund', 'premium', 'nice', 'process']

for i in range(len(x1)):
    for j in range(len(x1_sub)):
        if (x1['word'][i]==x1_sub['word'][j]):
            print(i, j)

Produces the following output: 产生以下输出:

TypeError                                 Traceback (most recent call last)
<ipython-input-68-dec8c7e33757> in <module>()
      8 for i in range(len(x1)):
      9     for j in range(len(x1_sub)):
---> 10         if (x1['word'][i]==x1_sub['word'][j]):
     11             print(i, j)

TypeError: list indices must be integers or slices, not str

I think looping is best avoid in pandas, because very slow if exist some vectorized solution: 我认为最好避免在熊猫中循环,因为如果存在某些矢量化解决方案,速度会非常慢:

x1_sub  = ['service', 'claim', 'health', 'fund', 'premium', 'nice', 'process']

x2 = x1[x1['word'].isin(x1_sub)]
print (x2)
       word  score
0   service      1
5     claim      2
15   health      5
17     fund      4
22  premium      1
26     nice      3

尝试使用Dataframe.set_index(keys,inplace = True)分配单独的索引,请参阅此文档https://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.set_index.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 boolean 索引索引 pandas dataframe? - How do I index an pandas dataframe using boolean indexing? 如何将熊猫数据框子集索引重置为默认值 - How do I reset pandas dataframe subset index to default 熊猫,如何为通过附加多个数据框而生成的数据框重新编制索引。 - Pandas, how to reindex a dataframe that is generated from appending multiple dataframe. 如何显示熊猫数据框的子集? - How do I display a subset of a pandas dataframe? "如何设置 pandas 数据框的子集?" - How do I style a subset of a pandas dataframe? Pandas groupby agg 返回一些不是数据帧的东西。 如何评估数据帧? - Pandas groupby agg returns something that is not a dataframe. How do I evaluate to a dataframe? 使用 pandas 进行 DataFrame 索引的问题 - Problems with DataFrame indexing with pandas 如何在 Pandas 数据框中用 NaN 选择和替换特定值。 如何从每个级别 1 多索引中删除一列 - How to select, and replace specific values with NaN in pandas dataframe. How to remove a column from each level 1 multiindex dataframe 子集中的字符串索引 - pandas - String Indexing in dataframe subset - pandas 如何清理这个 dataframe。我想使用 python 从表中删除“\n”而不影响表 - How do I clean this dataframe. I want to remove "\n" from the table using python without affecting the table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM