简体   繁体   English

如何将pandas数据框切片为函数中的参数?

[英]How do you slice a pandas dataframe as an argument in a function?

What I am looking to do is to put the rules of slicing a pandas dataframe in a function. 我要做的是将切片熊猫数据框的规则放在函数中。

For example: 例如:

row1 = {'a':5,'b':6,'c':7,'d':'A'}
row2 = {'a':8,'b':9,'c':10,'d':'B'}
row3 = {'a':11,'b':12,'c':13,'d':'C'}
df = pd.DataFrame([row1,row2,row3])

I am slicing the dataframe this way: 我以这种方式切片数据框:

print df.loc[df['a']==5]
print df.loc[df['b']==12]
print df.loc[(df['b']==12) | df['d'].isin(['A','C']),'d']

For my purposes, I need to slice the same dataframe in different ways as part of a function. 出于我的目的,我需要以不同的方式对同一数据帧进行切片,以作为函数的一部分。 For example: 例如:

def slicing(locationargument):
    df.loc(locationargument)
    do some stuff..
    return something

Alternatively, I was expecting getattr() to work but that tells me DataFrames do not have a .loc[...] attribute. 另外,我期望getattr()可以工作,但这告诉我DataFrames没有.loc [...]属性。 For example: 例如:

getattr(df,"loc[df['a']==5]")

Returns: 返回:

AttributeError: 'DataFrame' object has no attribute 'loc[df['a']==5]'

Am I missing something here? 我在这里想念什么吗? Any thoughts or alternatives would be greatly appreciated! 任何想法或选择将不胜感激!

In Pandas, I believe it's not quite right to think of .loc as a function (or method) on a DataFrame . 在Pandas中,我认为将.loc视为DataFrame上的函数(或方法)并不完全正确。 For example, the syntax df.loc(...) is not right. 例如,语法df.loc(...)不正确。 Instead, you need to write df.loc[...] (brackets, not parentheses). 相反,您需要编写df.loc[...] (括号,而不是括号)。

So how about simply: 那么如何简单:

def slicing(locationargument):
    df.loc[locationargument]
    do some stuff..
    return something

But then the question becomes "what type of object should locationargument be? If it's an iterable whose length is equal to the number of rows in your data frame, you're all set. An alternative could be to make it a string and then write: 但是问题就变成了“ locationargument应该是什么类型的对象?如果它是一个长度等于数据帧中行数的可迭代对象,那么您就已经全部设置好了。另一种方法是将其设置为字符串然后编写:

def slicing(locationargumentstring):
    df.loc[eval(locationargumentstring)]
    do some stuff..
    return something

If you go the getattr route, remember that the attribute doesn't include parameters. 如果您使用getattr路线,请记住该属性不包含参数。 So the following is bad: 所以以下是不好的:

getattr(df, "loc[df['a']==5]")

but the following would work: 但以下方法会起作用:

getattr(df, "loc")[eval("df['a']==5")]

and, more directly, so would 更直接的是

getattr(df, "loc")[df['a']==5]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM