简体   繁体   English

select dataframe 行如何根据索引值和列值判断?

[英]How to select dataframe rows based on index value and columns value criteria?

I have a dataframe that is a stacked version of n=9 dataframes:我有一个 dataframe,它是n=9数据帧的堆叠版本:

df
        f     a   config
0   0.491 0.368  old.000
1   0.369 0.333  old.000
2   0.372 0.276  old.000
3   0.346 0.300  old.000
4   0.213 0.161  old.000
..    ...   ...      ...
212 1.000 1.000  fin.111
213 1.000 1.000  fin.111
214 1.000 1.000  fin.111
215 1.000 1.000  fin.111
216 1.000 1.000  fin.111

[1953 rows x 3 columns]

Each "stacked" sub-dataframe corresponds to a different value of config , ie:每个“堆叠”子数据帧对应于不同的config值,即:

df['config'].unique()
array(['old.000', 'fin.000', 'fin.001', 'fin.010', 'fin.011', 'fin.100',
       'fin.101', 'fin.110', 'fin.111'], dtype=object)

I want to "filter" this dataframe by a criteria (composed of index and config value) given by a pd.Series:我想通过 pd.Series 给出的标准(由indexconfig值组成)“过滤”这个 dataframe:

ser_criteria
0      old.001
1      fin.101
2      fin.100
3      fin.101
4      fin.101
        ...   
212    fin.000
213    old.000
214    old.000
215    old.000
216    old.000
Length: 217, dtype: object

So, I would need my output to be given by:因此,我需要通过以下方式提供我的 output:

df_filtered
        f     a   config
0   0.481 0.368  old.001
1   0.569 0.333  fin.101
2   0.672 0.276  fin.100
3   0.378 0.111  fin.101
4   0.987 0.213  fin.101
..    ...   ...   ...   
212 0.500 0.111  fin.000
213 1.000 1.000  old.000
214 0.765 0.123  old.000
215 0.000 1.000  old.000
216 0.333 0.123  old.000

[217 rows x 3 columns]

What is the more efficient way to do this?更有效的方法是什么? The only way I could find was to do this element by element (from index)...我能找到的唯一方法是逐个元素(从索引)执行此操作...

Convert index to index columns in both DataFrames and use right join:index转换为两个 DataFrame 中的index列并使用右连接:

df = (ser_criteria.rename('config')
                  .reset_index()
                  .merge(df.reset_index(), on=['index','config'],how='right')
                  .drop('index', axis=1))

If the index values in your filter criteria are not important, you can do the following:如果过滤条件中的索引值不重要,您可以执行以下操作:

import pandas as pd

# Assign your dataframe to df (example)
df = pd.read_csv("Documents/dataframe.tsv", sep="\t")
df

f   a   config
0   0.491   0.368   old.000
1   0.369   0.333   old.000
2   0.372   0.276   old.000
3   0.346   0.300   old.000
4   0.213   0.161   old.000
212 1.000   1.000   fin.111
213 1.000   1.000   fin.111
214 1.000   1.000   fin.111
215 1.000   1.000   fin.111
216 1.000   1.000   fin.111


# Assign your filter series to filter (example)
filter = pd.Series(('old.001', 'fin.101', 'fin.100', 'fin.101', 'fin.101', 'fin.000', 'old.000', 'old.000', 'old.000', 'old.000'))
filter

0    old.001
1    fin.101
2    fin.100
3    fin.101
4    fin.101
5    fin.000
6    old.000
7    old.000
8    old.000
9    old.000
dtype: object

# Then subset the dataframe by which rows have a config value in your filter
df[df['config'].isin(filter)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM