[英]Delete rows pandas Dataframe based on index (multiple criteria) (Python 3.5.1)
Suppose I have a Pandas DataFrame with MultiIndex on rows. 假设我在行上有一个带有MultiIndex的Pandas DataFrame。 How can I delete rows based on the value of one of the levels of the index based on multiple criteria?
如何基于基于多个条件的索引级别之一的值删除行?
For example, suppose I have 例如,假设我有
import pandas as pd
df = {'population': [100, 200, 300, 400, 500, 600, 700, 800]}
arrays = [['NJ', 'NJ', 'NY', 'NY', 'CA', 'CA', 'NV', 'NV'],
['A', 'B', None, 'D', 'E', 'F', None, 'G']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['state', 'county'])
df = pd.DataFrame(df, index=index)
population
state county
NJ A 100
B 200
NY NaN 300
D 400
CA E 500
F 600
NV NaN 700
G 800
I want to delete all rows where the county
level of the index is NaN and also delete it when it is equal to 'D' and 'G'. 我想删除
county
索引为NaN的所有行,并且当它等于“ D”和“ G”时也删除它。 In other words, I want to end up with a DataFrame 换句话说,我想以一个DataFrame结尾
population
state county
NJ A 100
B 200
D 400
CA E 500
F 600
So the following sort of works: 因此,下面的工作如下:
df = df.iloc[df.index.get_level_values('county') != 'D']
df = df.iloc[df.index.get_level_values('county') != 'G']
But the problem is that in my real use case there is several of these criteria. 但是问题在于,在我的实际用例中,有几个标准。 Also, I can't seem to find a way to delete NaN's using this method.
另外,我似乎找不到使用此方法删除NaN的方法。
Thanks! 谢谢!
You could try using the inverse operator (~) on boolean indexing. 您可以尝试在布尔索引中使用逆运算符(〜)。 For example,
例如,
import numpy as np
df[~(df.index.get_level_values('county').isin(['A', 'B', np.nan]))]
this line of code says "select from df where county is NOT in some list" 这行代码说“从df中选择县不在列表中的地方”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.