[英]Filtering a dataframe from information in another dataframe using Pandas
I have a dataframe that is below. 我有一个下面的数据框。
df = pd.DataFrame(columns=['Chromosome', 'Start','End'],
data=[
['chr1', 2000, 3000],
['chr1', 500, 1500],
['chr3', 3000, 4000],
['chr5', 4000, 5000],
['chr17', 9000, 10000],
['chr19', 1500, 2500]
])
I have a probe dataframe as below. 我有一个探测数据框,如下所示。
probes = pd.DataFrame(columns=['Probe', 'Chrom','Position'],
data=[
['CG999', 'chr1', 2500],
['CG000', 'chr19, 2000],
])
I want to filter df for rows which contains a probes chromosome and which has the probes position between it's Start and End numbers, then add the probes name to a new column/field in df. 我想对包含探针染色体并且在其起始编号和终止编号之间具有探针位置的行过滤df,然后将探针名称添加到df中的新列/字段中。 The desired output is below:
所需的输出如下:
Probe Chrom Start End
0 CG999 chr1 2000 3000
5 CG000 chr19 1500 2500
My attempt below works but doesn't place the probe name into a Probe column and is reliant on looping probes data. 我在下面的尝试有效,但是没有将探针名称放入“探针”列,并且依赖于循环探针数据。 There must be a more efficient way of doing this.
必须有一种更有效的方法来执行此操作。
all_indexes = []
# fake2.tsv is the aforementioned probes dataframe
with open('fake2.tsv') as f:
for x in f:
probe, chrom, pos = x.rstrip("\n").split("\t")
row = df[(df['Chromosome'] == chrom) & ((int(pos) > df['Start']) & (int(pos) < df['End']))]
all_indexes.append(t.index.tolist())
all_t = [y for x in all_t for y in x]
df.iloc[all_indexes]
You can try this: 您可以尝试以下方法:
df.merge(probes, left_on='Chromosome', right_on='Chrom').query('Start < Position < End')
Output: 输出:
Chromosome Start End Probe Chrom Position
0 chr1 2000 3000 CG999 chr1 2500
2 chr19 1500 2500 CG000 chr19 2000
I just encountered the same problem, and apparently there is no built-in solution in pandas. 我只是遇到了同样的问题,而且熊猫显然没有内置的解决方案。 However you may use of the solutions on following threads:
但是,您可以在以下线程上使用解决方案:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.