[英]How to get column and row from partial string in pandas dataframe efficenitly
How to get column row and value from partial string efficiently with Pandas 如何使用Pandas从部分字符串有效获取列行和值
I have a pandas dataframe setup with about 150 indexes and 8 columns what I am looking to do is efficiently get the the column and index for cells based on a partial string. 我有一个带有约150个索引和8列的pandas数据帧设置,我想要做的是根据部分字符串有效地获取单元格的列和索引。 What I came up with was as follows:
我想到的是:
df = pd.DataFrame([["foo", "foo", "foo", "foo"], ["foo", "bar", "foo", "foo"], ["bar", "foo", "foo", "bar"],
["foo", "foo", "foo", "bar"]])
Output: 输出:
0 1 2 3
0 foo foo foo foo
1 foo bar foo foo
2 bar foo foo bar
3 foo foo foo bar
Here if I'm looking for just the entries that contain the sub-string "ar" I employ: 在这里,如果我只是在寻找包含子字符串“ ar”的条目,则使用:
setup_mask = df.applymap(lambda x: "ar" in str(x))
values_hold = []
for x in df.index:
for y in df.columns:
if setup_mask.loc[x, y].any() == bool(True):
if [x, y] not in values_hold:
values_hold.append([x, y])
This works well and returns a list of index column values [[1, 1], [2, 0], [2, 3], [3, 3]]. 这可以很好地工作并返回索引列值的列表[[1,1],[2,0],[2,3],[3,3]]。
This feels unpythonic and really just plain messy is there a way to do something like this in a more pythonic way? 这感觉很不可思议,实际上只是一团糟,有没有办法以更pythonic的方式做这样的事情?
PS I know I could cut out the mask but I felt like if there is a more pythonic way it would rely on a mask. 附言:我知道我可以剪掉面具,但是我觉得如果有一种更Python化的方式可以依靠面具。
Pandas supports vectorized string operations, but only on one column at a time. Pandas支持矢量化的字符串操作,但一次仅支持一列。 So:
所以:
df.apply(lambda ser: ser.str.contains('ar'))
Will give you this: 会给你这个:
0 1 2 3
0 False False False False
1 False True False False
2 True False False True
3 False False False True
And it's pretty efficient so long as you have fewer columns than rows (which you do). 只要您的列数少于行数(这样做),它就会非常有效。
If you store the above in mask
, then: 如果将以上内容存储在
mask
,则:
np.transpose(np.where(mask))
Gives you your answer: 给您答案:
array([[1, 1],
[2, 0],
[2, 3],
[3, 3]])
You can use transform
with str.contains
and stack
您可以
transform
str.contains
和stack
In [5352]: s = df.transform(lambda x: x.str.contains('ar')).stack()
In [5353]: s.index[s].tolist()
Out[5353]: [(1L, 1L), (2L, 0L), (2L, 3L), (3L, 3L)]
Or, as list of lists 或者,作为列表清单
In [5366]: [list(map(int, x)) for x in s.index[s]]
Out[5366]: [[1, 1], [2, 0], [2, 3], [3, 3]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.