Pandas - filter and regex search the index of DataFrame

Question

I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...] .

I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?

I looked into the df.filter() with the regex argument, but it fails and I get:

df.filter(regex='a')
TypeError: expected string or buffer

or:

df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern

I've tried other things such as passing re.compile('a') to no avail.

Answer 1

Maybe try a different approach by using list comprehension and .ix:

import pandas as pd

df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])

df.ix[[x for x in df.index if x=='Bob']]

df.ix[[x for x in df.index if x[0]=='A']]

df.ix[[x for x in df.index if x.islower()]]

Answer 2

So it looks like part of my problem with filter was that I was using an outdated version of pandas. After updating I no longer get the TypeError . After some playing around, it looks like I can use filter to fit my needs. Here is what I found out.

Simply setting df.filter(regex='string') will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1) .

To search the index, I simply need to do df.filter(regex='string', axis=0)

Answer 3

How about using pandas.Series.str.contains(). The function works in both series and index if your index is confined to the string. Boolean for non-string becomes nan.

import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
mask = df.index.str.contains(rf"^A")
columns = df.index[mask]  # columns = Index(['Andrew'], dtype='object')

Pandas - filter and regex search the index of DataFrame

Question

3 answers

solution1
4 ACCPTED 2016-02-25 22:07:19

solution2
2 2016-03-01 15:42:30

solution3
2 2022-03-21 01:48:12

Pandas - filter and regex search the index of DataFrame

Question

3 answers

solution1 4 ACCPTED 2016-02-25 22:07:19

solution2 2 2016-03-01 15:42:30

solution3 2 2022-03-21 01:48:12

solution1
4 ACCPTED 2016-02-25 22:07:19

solution2
2 2016-03-01 15:42:30

solution3
2 2022-03-21 01:48:12