I have a dataframe which looks like this:
0 1 2 3 4 5 6
0 a(A) b c c d a a
1 b h w k d c(A) k
2 g e(A) s g h s f
3 f d s h(A) c w n
4 e g s b c e w
I want to get the index of the cell which contains (A)
in each column.
0 0
1 2
2 NaN
3 3
4 NaN
5 1
6 NaN
I tried this code but the result doesn't reach my expectation.
df.apply(lambda x: (x.str.contains(r'(A)')==True).idxmax(), axis=0)
Result looks like this:
0 0
1 2
2 0
3 3
4 0
5 1
6 0
I think it returns the first index if there is no (A)
in that column.
How should I fix it?
Use Series.where
for set default missing value for overwrite default 0
value of DataFrame.idxmax
:
mask = df.apply(lambda x: x.str.contains('A'))
s1 = mask.idxmax().where(mask.any())
print (s1)
0 0.0
1 2.0
2 NaN
3 3.0
4 NaN
5 1.0
6 NaN
dtype: float64
You could do what you're doing but explicitly check if the rows contain any matches:
In [51]: pred = df.applymap(lambda x: '(A)' in x)
In [52]: pred.idxmax() * np.where(pred.any(), 1, np.nan)
Out[52]:
0 0.0
1 2.0
2 NaN
3 3.0
4 NaN
5 1.0
6 NaN
dtype: float64
Or alternatively, using DataFrame.where
directly:
In [211]: pred.where(pred).idxmax()
Out[211]:
0 0.0
1 2.0
2 NaN
3 3.0
4 NaN
5 1.0
6 NaN
dtype: float64
A slightly cheatier one-liner is to use DataFrame.where
on the identity:
In [78]: df.apply(lambda x: x.str.contains('A')).where(lambda x: x).idxmax()
Out[78]:
0 0.0
1 2.0
2 NaN
3 3.0
4 NaN
5 1.0
6 NaN
Add an if condition at the end of the apply
:
>>> df.apply(lambda x: x.str.contains('A').idxmax() if 'A' in x[x.str.contains('A').idxmax()] else np.nan)
0 0.0
1 2.0
2 NaN
3 3.0
4 NaN
5 1.0
6 NaN
dtype: float64
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.