I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...]
.
I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?
I looked into the df.filter()
with the regex argument, but it fails and I get:
df.filter(regex='a')
TypeError: expected string or buffer
or:
df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern
I've tried other things such as passing re.compile('a')
to no avail.
Maybe try a different approach by using list comprehension and .ix:
import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
df.ix[[x for x in df.index if x=='Bob']]
df.ix[[x for x in df.index if x[0]=='A']]
df.ix[[x for x in df.index if x.islower()]]
So it looks like part of my problem with filter
was that I was using an outdated version of pandas. After updating I no longer get the TypeError
. After some playing around, it looks like I can use filter
to fit my needs. Here is what I found out.
Simply setting df.filter(regex='string')
will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1)
.
To search the index, I simply need to do df.filter(regex='string', axis=0)
How about using pandas.Series.str.contains(). The function works in both series and index if your index is confined to the string. Boolean for non-string becomes nan.
import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
mask = df.index.str.contains(rf"^A")
columns = df.index[mask] # columns = Index(['Andrew'], dtype='object')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.