简体   繁体   English

Pandas - 过滤和正则表达式搜索 DataFrame 的索引

[英]Pandas - filter and regex search the index of DataFrame

I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...] .我有一个 DataFrame,其中列是 MultiIndex,索引是名称列表,即index=['Andrew', 'Bob', 'Calvin',...]

I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase.我想创建一个 function 以返回 dataframe 中使用名称“Bob”或者以字母“A”开头或以小写字母开头的所有行。 How can this be done?如何才能做到这一点?

I looked into the df.filter() with the regex argument, but it fails and I get:我用正则表达式参数查看了df.filter() ,但它失败了,我得到:

df.filter(regex='a')
TypeError: expected string or buffer

or:或者:

df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern

I've tried other things such as passing re.compile('a') to no avail.我尝试过其他方法,例如传递re.compile('a')无济于事。

Maybe try a different approach by using list comprehension and .ix: 也许可以通过使用列表理解和.ix尝试不同的方法:

import pandas as pd

df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])

df.ix[[x for x in df.index if x=='Bob']]

df.ix[[x for x in df.index if x[0]=='A']]

df.ix[[x for x in df.index if x.islower()]]

So it looks like part of my problem with filter was that I was using an outdated version of pandas. 因此,看起来我的filter问题的一部分是我使用的是过时版本的熊猫。 After updating I no longer get the TypeError . 更新后,我不再得到TypeError After some playing around, it looks like I can use filter to fit my needs. 经过一番游戏之后,看来我可以使用filter来满足我的需求了。 Here is what I found out. 这是我发现的。

Simply setting df.filter(regex='string') will return the columns which match the regex. 只需设置df.filter(regex='string')返回与正则表达式匹配的列。 This looks to do the same as df.filter(regex='string', axis=1) . 这看起来与df.filter(regex='string', axis=1)

To search the index, I simply need to do df.filter(regex='string', axis=0) 要搜索索引,我只需要做df.filter(regex='string', axis=0)

How about using pandas.Series.str.contains().使用 pandas.Series.str.contains() 怎么样? The function works in both series and index if your index is confined to the string.如果您的索引仅限于字符串,则 function 适用于系列和索引。 Boolean for non-string becomes nan.非字符串的 Boolean 变为 nan。

import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
mask = df.index.str.contains(rf"^A")
columns = df.index[mask]  # columns = Index(['Andrew'], dtype='object')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM