在 pandas dataframe 中查找列和索引

Question

我有一个 pandas dataframe：

  col1 | col2 | col3 | col4 |
0.  A  | B    |    C |     G|
1.  I  | J    |    S |     D|
2.  O  | L    |    C |     G|
3.  A  | B    |    H |     D|
4.  H  | B    |    C |     P|

# reproducible
import pandas as pd
from string import ascii_uppercase as uc  # just for sample data
import random  # just for sample data

random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
                   'col2': [random.choice(uc) for _ in range(20)],
                   'col3': [random.choice(uc) for _ in range(20)],
                   'col4': [random.choice(uc) for _ in range(20)]})

我正在寻找这样的 function：

func('H')

这将返回“H”所在的索引和列的所有名称。 有任何想法吗？

Answer 1

使用np.argwhere和df.to_numpy ：

rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))

或者，

indices = df.where(df.eq('H')).stack().index.tolist()

# print(indices)
[(3, 'col3'), (4, 'col1')]

所有timeit的时间比较：

df.shape
(50000, 4)

%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols])) 
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c])) 
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 2

一种解决方案是使用熔体：

df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])

output 是：

    inx variable value
4     4     col1     H
13    3     col3     H

您现在可以轻松地提取列和索引。

Answer 3

使用 np.where 和索引（更新以增加性能）：

r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))

Output：

[(3, 'col3'), (4, 'col1')]

在 pandas dataframe 中查找列和索引

问题描述

3 个解决方案

解决方案1
3 2020-06-17 16:43:02

解决方案2
2 2020-06-17 16:39:25

解决方案3
2 2020-06-17 16:46:33

在 pandas dataframe 中查找列和索引

问题描述

3 个解决方案

解决方案1 3 2020-06-17 16:43:02

解决方案2 2 2020-06-17 16:39:25

解决方案3 2 2020-06-17 16:46:33

解决方案1
3 2020-06-17 16:43:02

解决方案2
2 2020-06-17 16:39:25

解决方案3
2 2020-06-17 16:46:33