![](/img/trans.png)
[英]Finding the index of rows based on a sequence of values in a column of pandas DataFrame
[英]Finding Column and Index in pandas dataframe
我有一个 pandas dataframe:
col1 | col2 | col3 | col4 |
0. A | B | C | G|
1. I | J | S | D|
2. O | L | C | G|
3. A | B | H | D|
4. H | B | C | P|
# reproducible
import pandas as pd
from string import ascii_uppercase as uc # just for sample data
import random # just for sample data
random.seed(365)
df = pd.DataFrame({'col1': [random.choice(uc) for _ in range(20)],
'col2': [random.choice(uc) for _ in range(20)],
'col3': [random.choice(uc) for _ in range(20)],
'col4': [random.choice(uc) for _ in range(20)]})
我正在寻找这样的 function:
func('H')
这将返回“H”所在的索引和列的所有名称。 有任何想法吗?
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
或者,
indices = df.where(df.eq('H')).stack().index.tolist()
# print(indices)
[(3, 'col3'), (4, 'col1')]
所有timeit
的时间比较:
df.shape
(50000, 4)
%%timeit -n100 @Shubham1
rows, cols = np.argwhere(df.to_numpy() == 'H').T
indices = list(zip(df.index[rows], df.columns[cols]))
8.87 ms ± 218 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Scott
r,c = np.where(df == 'H')
_ = list(zip(df.index[r], df.columns[c]))
17.4 ms ± 510 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Shubham2
indices = df.where(df.eq('H')).stack().index.tolist()
26.8 ms ± 165 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit -n100 @Roy
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
_ = t[t.value == "H"]
29 ms ± 656 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
一种解决方案是使用熔体:
df.index.name = "inx"
t = df.reset_index().melt(id_vars = "inx")
print(t[t.value == "H"])
output 是:
inx variable value
4 4 col1 H
13 3 col3 H
您现在可以轻松地提取列和索引。
使用 np.where 和索引(更新以增加性能):
r, c = np.where(df.to_numpy() == 'H')
list(zip(df.index[r], df.columns[c]))
Output:
[(3, 'col3'), (4, 'col1')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.