[英]Python: pandas Data Frame and meaning of {Series}0 in debugger
I am using pandas in Python 2.7 and read a csv file like this: 我在Python 2.7中使用熊猫,并读取了这样的csv文件:
import pandas as pd
df = pd.read_csv("test_file.csv")
df has a column titled rating
, and a column titled 'review', I do some manipulations on df
for example: df有一个标题为rating
的列和一个名为'review'的列,例如,我对df
做了一些操作:
df3 = df[df['rating'] != 3]
Now if I look in a debugger at df['review']
and df3['review']
I see this information: 现在,如果我在df['review']
和df3['review']
查看调试器,则会看到以下信息:
df['review'] = {Series}0
df3['review'] = {Series}1
Also if I want to see the first element of df['review']
I use: 另外,如果我想查看df['review']
的第一个元素,请使用:
df['review'][0]
which is fine, but if I do the same for df3
, I get this error: 很好,但是如果我对df3
做同样的操作,则会收到此错误:
df3['review'][0]
{KeyError}0L
However, it looks like I can do this: 但是,看来我可以这样做:
df3['review'][1]
Can someone please explain the difference? 有人可以解释一下区别吗?
Indexing with an integer on a Series doesn't work like a list. 在Series上使用整数索引不像列表那样工作。 In particular, df['review'][0]
doesn't get the first element of the "review" column, it gets the element with index 0: 特别是, df['review'][0]
不会获得“评论”列的第一个元素,它会获得索引为0的元素:
In [4]: s = pd.Series(['a', 'b', 'c', 'd'], index=[1, 0, 2, 3])
In [5]: s
Out[5]:
1 a
0 b
2 c
3 d
dtype: object
In [6]: s[0]
Out[6]: 'b'
Presumably, in generating df3
you dropped the row with index 0. If you actually want to get the first element regardless of the index, use iloc
: 大概在生成df3
您删除了索引为0的行。如果实际上无论索引如何,都希望获取第一个元素,请使用iloc
:
In [7]: s.iloc[0]
Out[7]: 'a'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.