如何有效地从熊猫数据框中的部分字符串获取列和行

Question

How to get column row and value from partial string efficiently with Pandas 如何使用Pandas从部分字符串有效获取列行和值

I have a pandas dataframe setup with about 150 indexes and 8 columns what I am looking to do is efficiently get the the column and index for cells based on a partial string. 我有一个带有约150个索引和8列的pandas数据帧设置，我想要做的是根据部分字符串有效地获取单元格的列和索引。 What I came up with was as follows: 我想到的是：

df = pd.DataFrame([["foo", "foo", "foo", "foo"], ["foo", "bar", "foo", "foo"], ["bar", "foo", "foo", "bar"],
                   ["foo", "foo", "foo", "bar"]])

Output: 输出：

 0    1    2    3
 0  foo  foo  foo  foo
 1  foo  bar  foo  foo
 2  bar  foo  foo  bar
 3  foo  foo  foo  bar

Here if I'm looking for just the entries that contain the sub-string "ar" I employ: 在这里，如果我只是在寻找包含子字符串“ ar”的条目，则使用：

setup_mask = df.applymap(lambda x: "ar" in str(x))
values_hold = []
for x in df.index:
    for y in df.columns:
        if setup_mask.loc[x, y].any() == bool(True):
            if [x, y] not in values_hold:
                values_hold.append([x, y])

This works well and returns a list of index column values [[1, 1], [2, 0], [2, 3], [3, 3]]. 这可以很好地工作并返回索引列值的列表[[1，1]，[2，0]，[2，3]，[3，3]]。

This feels unpythonic and really just plain messy is there a way to do something like this in a more pythonic way? 这感觉很不可思议，实际上只是一团糟，有没有办法以更pythonic的方式做这样的事情？

PS I know I could cut out the mask but I felt like if there is a more pythonic way it would rely on a mask. 附言：我知道我可以剪掉面具，但是我觉得如果有一种更Python化的方式可以依靠面具。

Answer 1

Pandas supports vectorized string operations, but only on one column at a time. Pandas支持矢量化的字符串操作，但一次仅支持一列。 So: 所以：

df.apply(lambda ser: ser.str.contains('ar'))

Will give you this: 会给你这个：

       0      1      2      3
0  False  False  False  False
1  False   True  False  False
2   True  False  False   True
3  False  False  False   True

And it's pretty efficient so long as you have fewer columns than rows (which you do). 只要您的列数少于行数（这样做），它就会非常有效。

If you store the above in mask , then: 如果将以上内容存储在mask ，则：

np.transpose(np.where(mask))

Gives you your answer: 给您答案：

array([[1, 1],
       [2, 0],
       [2, 3],
       [3, 3]])

Answer 2

You can use transform with str.contains and stack 您可以transform str.contains和stack

In [5352]: s = df.transform(lambda x: x.str.contains('ar')).stack()

In [5353]: s.index[s].tolist()
Out[5353]: [(1L, 1L), (2L, 0L), (2L, 3L), (3L, 3L)]

Or, as list of lists 或者，作为列表清单

In [5366]: [list(map(int, x)) for x in s.index[s]]
Out[5366]: [[1, 1], [2, 0], [2, 3], [3, 3]]

如何有效地从熊猫数据框中的部分字符串获取列和行

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-09-30 14:12:18

解决方案2
1 2017-09-30 14:24:19

如何有效地从熊猫数据框中的部分字符串获取列和行

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-09-30 14:12:18

解决方案2 1 2017-09-30 14:24:19

解决方案1
1 已采纳 2017-09-30 14:12:18

解决方案2
1 2017-09-30 14:24:19