[英]index a Python Pandas dataframe with multiple conditions SQL like where statement
I am experienced in R and new to Python Pandas. 我在R和Python熊猫新手方面经验丰富。 I am trying to index a DataFrame to retrieve rows that meet a set of several logical conditions - much like the "where" statement of SQL. 我正在尝试为DataFrame编制索引,以检索满足一组若干逻辑条件的行-非常类似于SQL的“ where”语句。
I know how to do this in R with dataframes (and with R's data.table package, which is more like a Pandas DataFrame than R's native dataframe). 我知道如何在R中使用数据框(以及R的data.table包,而不是R的本机数据框,更像是Pandas DataFrame)来做到这一点。
Here's some sample code that constructs a DataFrame and a description of how I would like to index it. 这是一些构造DataFrame的示例代码,并描述了我希望如何对其进行索引。 Is there an easy way to do this? 是否有捷径可寻?
import pandas as pd
import numpy as np
# generate some data
mult = 10000
fruits = ['Apple', 'Banana', 'Kiwi', 'Grape', 'Orange', 'Strawberry']*mult
vegetables = ['Asparagus', 'Broccoli', 'Carrot', 'Lettuce', 'Rutabaga', 'Spinach']*mult
animals = ['Dog', 'Cat', 'Bird', 'Fish', 'Lion', 'Mouse']*mult
xValues = np.random.normal(loc=80, scale=2, size=6*mult)
yValues = np.random.normal(loc=79, scale=2, size=6*mult)
data = {'Fruit': fruits,
'Vegetable': vegetables,
'Animal': animals,
'xValue': xValues,
'yValue': yValues,}
df = pd.DataFrame(data)
# shuffle the columns to break structure of repeating fruits, vegetables, animals
np.random.shuffle(df.Fruit)
np.random.shuffle(df.Vegetable)
np.random.shuffle(df.Animal)
df.head(30)
# filter sets
fruitsInclude = ['Apple', 'Banana', 'Grape']
vegetablesExclude = ['Asparagus', 'Broccoli']
# subset1: All rows and columns where:
# (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)
# subset2: All rows and columns where:
# (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]
# subset3: All rows and specific columns where above logical conditions are true.
All help and inputs welcomed and highly appreciated! 欢迎所有帮助和投入,并高度赞赏!
Thanks, Randall 谢谢兰德尔
# subset1: All rows and columns where:
# (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude)]
# subset2: All rows and columns where:
# (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]
df.ix[df['Fruit'].isin(fruitsInclude) & (~df['Vegetable'].isin(vegetablesExclude) | (df['Animal']=='Dog'))]
# subset3: All rows and specific columns where above logical conditions are true.
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude) & (df['Animal']=='Dog')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.