简体   繁体   中英

How to get column and row from partial string in pandas dataframe efficenitly

How to get column row and value from partial string efficiently with Pandas

I have a pandas dataframe setup with about 150 indexes and 8 columns what I am looking to do is efficiently get the the column and index for cells based on a partial string. What I came up with was as follows:

df = pd.DataFrame([["foo", "foo", "foo", "foo"], ["foo", "bar", "foo", "foo"], ["bar", "foo", "foo", "bar"],
                   ["foo", "foo", "foo", "bar"]])

Output:

 0    1    2    3
 0  foo  foo  foo  foo
 1  foo  bar  foo  foo
 2  bar  foo  foo  bar
 3  foo  foo  foo  bar

Here if I'm looking for just the entries that contain the sub-string "ar" I employ:

setup_mask = df.applymap(lambda x: "ar" in str(x))
values_hold = []
for x in df.index:
    for y in df.columns:
        if setup_mask.loc[x, y].any() == bool(True):
            if [x, y] not in values_hold:
                values_hold.append([x, y])

This works well and returns a list of index column values [[1, 1], [2, 0], [2, 3], [3, 3]].

This feels unpythonic and really just plain messy is there a way to do something like this in a more pythonic way?

PS I know I could cut out the mask but I felt like if there is a more pythonic way it would rely on a mask.

Pandas supports vectorized string operations, but only on one column at a time. So:

df.apply(lambda ser: ser.str.contains('ar'))

Will give you this:

       0      1      2      3
0  False  False  False  False
1  False   True  False  False
2   True  False  False   True
3  False  False  False   True

And it's pretty efficient so long as you have fewer columns than rows (which you do).

If you store the above in mask , then:

np.transpose(np.where(mask))

Gives you your answer:

array([[1, 1],
       [2, 0],
       [2, 3],
       [3, 3]])

You can use transform with str.contains and stack

In [5352]: s = df.transform(lambda x: x.str.contains('ar')).stack()

In [5353]: s.index[s].tolist()
Out[5353]: [(1L, 1L), (2L, 0L), (2L, 3L), (3L, 3L)]

Or, as list of lists

In [5366]: [list(map(int, x)) for x in s.index[s]]
Out[5366]: [[1, 1], [2, 0], [2, 3], [3, 3]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM