简体   繁体   English

如何从一列中查找也出现在 Python 中 DataFrame 的另一列中的元素

[英]How to find the elements from one column which also appear in another column of a DataFrame in Python

I have transformed the following toy chemical reactions into a DataFrame for further bipartite network representation:我已将以下玩具化学反应转换为 DataFrame 以进行进一步的二分网络表示:

R1: A + B -> C

R2: C + D -> E

SourceTarget
R1    C
A     R1
B     R1
R2    E
C     R2
D     R2

Now, I want to create a new DataFrame from this one, representing only the relationships between the reactions based on their compounds, for example: In the DataFrame above C is a Target from R1 and C is also Source for R2 , then, the relationship should be:现在,我想从中创建一个新的 DataFrame,仅表示基于它们的化合物的反应之间的关系,例如:在上面的 DataFrame 中, C是来自R1目标,而C也是R2,那么,关系应该:

R1->R2

(the only reaction-reaction relationship I can obtain for the Daframe above) (我可以为上面的Daframe获得唯一的反应 - 反应关系)

The code I have created for this task is the following:我为此任务创建的代码如下:

newData=[]
    for i in range(0,len(data["Target"].index.values)):
        for j in range(0,len(data["Source"].index.values)):  
            if data.iloc[i,1] == data.iloc[j,0] and not re.match("R.", 
            data.iloc[i,1], flags=0):
                newData.append(data.iloc[i,0] +"\t" + data.iloc[j,1])

The code works, however, for big tables (thousands of rows) it gets very slow... I'm still a beginner, so I would be really glad if you could help me to improve it.但是,对于大表(数千行),代码可以工作,它变得非常慢......我仍然是初学者,所以如果你能帮助我改进它,我会很高兴。 Thanks =D谢谢=D

My preference would be for a dictionary-based approach:我更喜欢基于字典的方法:

import pandas as pd

d = df.set_index('Source')['Target']
r = {i for i in set(df['Source']).union(df['Target'])  if 'R' in i}

{k: d.get(d.get(k)) for k in r if d.get(d.get(k))}

# {'R1': 'R2'}

You could merge the dataframe on the dateframe您可以合并日期框上的数据框

RtoC = df.merge(df,how='inner',left_on='Source',right_on='Target')\
                .drop(['Target_y','Source_x'],axis=1)\
                .rename(columns={'Target_x':'Target','Source_y':'Source'})

Then filter out compounds然后过滤掉化合物

RtoC[(RtoC.Target.str.contains('\d()')) & (RtoC.Source.str.contains('\d()'))]


  Target Source
4     R2     R1

Or Convert to a dictionary, map the values and filter或转换为字典,映射值并过滤

mapper = dict(df.values[::-1])

df.Target = df.Target.map(mapper)

df[(df.Target.str.contains('\d()')) & (df.Source.str.contains('\d()'))]

  Source Target
0     R1     R2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:在DataFrame中,如何查找某一列中的字符串出现在另一列中的年份? - Python: In a DataFrame, how do I find the year that strings from one column appear in another column? 在一个 dataframe 的一列中从另一个 dataframe 的另一列中查找字符串 - Find strings in a column of one dataframe from another column in a different dataframe Pandas Dataframe:从另一列中唯一值最多的列中查找唯一值 - Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column Python 3 Pandas:如何找到一列的行元素子集与另一列的元素之间的关系? - Python 3 Pandas: How can I find the relationship between a subset of row elements for one column and elements of another column? Python pandas 在另一列的元素列表中查找一列的元素 - Python pandas find element of one column in list of elements of another column 如何使用python将数据框中的一列值更改为另一列值 - How to change values from one column to another in a dataframe using python Python - 来自一个数据帧的日期时间列并从另一个数据帧中查找日期时间范围 - Python - Datetime column from one Dataframe and find datetime range from another dataframe 如何在同一数据框中的另一列中查找包含唯一值的列值? - How to find column values which contains unique value in another column from same dataframe? Python:在DataFrame中,如何遍历一列的所有字符串并检查它们是否出现在另一列中并计数? - Python: In a DataFrame, how do I loop through all strings of one column and check to see if they appear in another column and count them? 如何在熊猫数据框中找到与另一列中的多个值相对应的列中具有值的所有行? - How can I find all rows with a value in one column which corresponds to more than one value in another column in a pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM