简体   繁体   English

熊猫dropna系列

[英]pandas dropna on series

I have a pandas table df: 我有一个熊猫表df:

so the df is:

Item    | Category | Price
SKU123  | CatA     | 4.5
SKU124  | CatB     | 4.7
SKU124  | CatB     | 4.7
SKU125  | CatA     | NaN
SKU126  | CatB     | NaN
SKU127  | CatC     | 4.5

here is a generator 这是发电机

df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})

I am trying to drop anything with NaN. 我正在尝试使用NaN删除任何内容。

So I entered 所以我进入

filtered_df = df.drop_duplicates
filtered_df['Price'].dropna(inplace=True)

I get this error: 我收到此错误:

TypeError: 'instancemethod' object has no attribute '__getitem__'

The result I want is: 我想要的结果是:

Item    | Category | Price
SKU123  | CatA     | 4.5
SKU124  | CatB     | 4.7
SKU127  | CatC     | 4.5

The basic issue with your code is in the line - 您的代码的基本问题所在-

filtered_df = df.drop_duplicates

DataFrame.drop_duplicates is a method, you need to call it. DataFrame.drop_duplicates是一个方法,您需要调用它。

Also, another issue is that filtered_df['Price'].dropna(inplace=True) would not do what you want it to do, since even if the values are dropped from the series, since the index exists in the dataframe, it would again come up with NaN value in Series. 另外,另一个问题是, filtered_df['Price'].dropna(inplace=True)不会执行您想要的操作,因为即使从系列中删除了值,由于索引存在于数据帧中,它也会再次提出系列中的NaN值。

You can instead do boolean indexing based on the non null values of filtered_df['Price'] series. 您可以改为基于filtered_df['Price']系列的非空值进行布尔索引。 Example - 范例-

filtered_df = df.drop_duplicates()
filtered_df = filtered_df[filtered_df['Price'].notnull()]

But please note, in the example you gave to create the dataframe, the values are empty strings - '' - instead of NaN . 但请注意,在您创建数据框的示例中,值是空字符串- '' -而不是NaN If you control how you create the DataFrame, you should consider using None instead of '' . 如果你控制你如何创建数据框,你应该考虑使用None ,而不是''

But if the empty string comes from somwhere else, you can use Series.convert_objects method to convert them to NaN while indexing. 但是,如果空字符串来自其他地方,则可以在索引时使用Series.convert_objects方法将其转换为NaN Example - 范例-

filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]

Demo - 演示-

In [42]: df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})

In [43]: filtered_df = df.drop_duplicates()

In [44]: filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]

In [45]: filtered_df
Out[45]:
    Cat Price     sku
0  CatA   4.5  SKU123
1  CatB   4.7  SKU124
5  CatC   4.5  SKU127

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM