使用Pandas从另一个数据框中的信息中过滤一个数据框

Question

I have a dataframe that is below. 我有一个下面的数据框。

df = pd.DataFrame(columns=['Chromosome', 'Start','End'],
     data=[
           ['chr1', 2000, 3000],
           ['chr1', 500, 1500],
           ['chr3', 3000, 4000],
           ['chr5', 4000, 5000],
           ['chr17', 9000, 10000],
           ['chr19', 1500, 2500]
           ])

I have a probe dataframe as below. 我有一个探测数据框，如下所示。

probes = pd.DataFrame(columns=['Probe', 'Chrom','Position'],
     data=[
           ['CG999', 'chr1', 2500],
           ['CG000', 'chr19, 2000],
           ])

I want to filter df for rows which contains a probes chromosome and which has the probes position between it's Start and End numbers, then add the probes name to a new column/field in df. 我想对包含探针染色体并且在其起始编号和终止编号之间具有探针位置的行过滤df，然后将探针名称添加到df中的新列/字段中。 The desired output is below: 所需的输出如下：

    Probe    Chrom    Start    End
0   CG999    chr1     2000     3000
5   CG000    chr19    1500     2500

My attempt below works but doesn't place the probe name into a Probe column and is reliant on looping probes data. 我在下面的尝试有效，但是没有将探针名称放入“探针”列，并且依赖于循环探针数据。 There must be a more efficient way of doing this. 必须有一种更有效的方法来执行此操作。

all_indexes = []

# fake2.tsv is the aforementioned probes dataframe
with open('fake2.tsv') as f:
    for x in f:
        probe, chrom, pos = x.rstrip("\n").split("\t")
        row = df[(df['Chromosome'] == chrom) & ((int(pos) > df['Start']) & (int(pos) < df['End']))]
        all_indexes.append(t.index.tolist())

all_t = [y for x in all_t for y in x]
df.iloc[all_indexes]

Answer 1

You can try this: 您可以尝试以下方法：

df.merge(probes, left_on='Chromosome', right_on='Chrom').query('Start < Position < End')

Output: 输出：

  Chromosome  Start   End  Probe  Chrom  Position
0       chr1   2000  3000  CG999   chr1      2500
2      chr19   1500  2500  CG000  chr19      2000

Answer 2

I just encountered the same problem, and apparently there is no built-in solution in pandas. 我只是遇到了同样的问题，而且熊猫显然没有内置的解决方案。 However you may use of the solutions on following threads: 但是，您可以在以下线程上使用解决方案：

Best way to join/merge by range in pandas 在熊猫中按范围加入/合并的最佳方法
how to perform an inner or outer join of DataFrames with Pandas on non-simplistic criterion 如何在非简单条件下用熊猫执行DataFrames的内部或外部联接

使用Pandas从另一个数据框中的信息中过滤一个数据框

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-06-15 15:40:20

解决方案2
1 2017-06-15 15:33:09

使用Pandas从另一个数据框中的信息中过滤一个数据框

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-06-15 15:40:20

解决方案2 1 2017-06-15 15:33:09

解决方案1
3 已采纳 2017-06-15 15:40:20

解决方案2
1 2017-06-15 15:33:09