[英]Pandas dataframe index causing problems when indexing subset of dataframe. How do I remove the indexes, or prevent the error from occurring?
I have a dataframe x1
. 我有一个数据框
x1
。 I made a subset of the dataframe, x1_sub
, where I need to use a for loop to index its items. 我制作了数据
x1_sub
的子集x1_sub
,在这里我需要使用for循环来索引其项。 But because the subset retains the indexing of the original pandas dataframe, it has its rows like so: 但是由于子集保留了原始熊猫数据帧的索引,因此其行如下:
x1_sub['words']
1 investment
2 fund
4 company
7 claim
9 customer
20 easy
... ...
So, when I do something like this to index the rows of x1_sub
serially: 所以,当我做这样的事情来按
x1_sub
索引x1_sub
的行时:
for i in range(len(x1)):
for j in range(len(x1_sub)):
if (x1['word'][i]==x1_sub['word'][j]):
print(i, j)
it gives the following error: 它给出以下错误:
KeyError Traceback (most recent call last)
<ipython-input-48-e3c9806732a6> in <module>()
3 for i in range(len(x1)):
4 for j in range(len(x1_sub)):
----> 5 if (x1['word'][i]==x1_sub['word'][j]):
6 print(i, j)
7
c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
EDIT: Some example data: 编辑:一些示例数据:
The following data is saved in a csv file named example.csv
: 以下数据保存在名为
example.csv
的csv文件中:
word score
service 1
customer 4
agent 3
product 6
easy 2
claim 2
fast 1
financial 5
information 1
benefit 4
company 3
helpful 6
time 2
future 2
policy 1
health 5
life 1
fund 4
complicated 3
investment 6
join 2
payment 2
premium 1
excellent 5
experience 1
family 4
nice 3
proces 6
satisfactory 2
And the code is this: 代码是这样的:
import pandas as pd
x1 = pd.read_csv(r'C:\Users\h473\Documents\Indonesia_verbatims W1 2018\Insurance Data X3\example.csv')
x1_sub = x1[x1['score']<=2]
for i in range(len(x1)):
for j in range(len(x1_sub)):
if (x1['word'][i]==x1_sub['word'][j]):
print(i, j)
And this is the output: 这是输出:
0 0
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-63-08d55a712c99> in <module>()
7 for i in range(len(x1)):
8 for j in range(len(x1_sub)):
----> 9 if (x1['word'][i]==x1_sub['word'][j]):
10 print(i, j)
c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1
EDIT 2: Also, if x1_sub
is a list, then the error is different: 编辑2:此外,如果
x1_sub
是一个列表,则错误是不同的:
import pandas as pd
x1 = pd.read_csv(r'C:\Users\h473\Documents\Indonesia_verbatims W1 2018\Insurance Data X3\example.csv')
#x1_sub = x1[x1['score']<=2]
x1_sub = ['service', 'claim', 'health', 'fund', 'premium', 'nice', 'process']
for i in range(len(x1)):
for j in range(len(x1_sub)):
if (x1['word'][i]==x1_sub['word'][j]):
print(i, j)
Produces the following output: 产生以下输出:
TypeError Traceback (most recent call last)
<ipython-input-68-dec8c7e33757> in <module>()
8 for i in range(len(x1)):
9 for j in range(len(x1_sub)):
---> 10 if (x1['word'][i]==x1_sub['word'][j]):
11 print(i, j)
TypeError: list indices must be integers or slices, not str
I think looping is best avoid in pandas, because very slow if exist some vectorized solution: 我认为最好避免在熊猫中循环,因为如果存在某些矢量化解决方案,速度会非常慢:
x1_sub = ['service', 'claim', 'health', 'fund', 'premium', 'nice', 'process']
x2 = x1[x1['word'].isin(x1_sub)]
print (x2)
word score
0 service 1
5 claim 2
15 health 5
17 fund 4
22 premium 1
26 nice 3
尝试使用Dataframe.set_index(keys,inplace = True)分配单独的索引,请参阅此文档https://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.set_index.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.