[英]How to find the elements from one column which also appear in another column of a DataFrame in Python
I have transformed the following toy chemical reactions into a DataFrame for further bipartite network representation:我已将以下玩具化学反应转换为 DataFrame 以进行进一步的二分网络表示:
R1: A + B -> C
R2: C + D -> E
SourceTarget
R1 C
A R1
B R1
R2 E
C R2
D R2
Now, I want to create a new DataFrame from this one, representing only the relationships between the reactions based on their compounds, for example: In the DataFrame above C
is a Target from R1
and C
is also Source for R2
, then, the relationship should be:现在,我想从中创建一个新的 DataFrame,仅表示基于它们的化合物的反应之间的关系,例如:在上面的 DataFrame 中, C
是来自R1
的目标,而C
也是R2
源,那么,关系应该:
R1->R2
(the only reaction-reaction relationship I can obtain for the Daframe above) (我可以为上面的Daframe获得唯一的反应 - 反应关系)
The code I have created for this task is the following:我为此任务创建的代码如下:
newData=[]
for i in range(0,len(data["Target"].index.values)):
for j in range(0,len(data["Source"].index.values)):
if data.iloc[i,1] == data.iloc[j,0] and not re.match("R.",
data.iloc[i,1], flags=0):
newData.append(data.iloc[i,0] +"\t" + data.iloc[j,1])
The code works, however, for big tables (thousands of rows) it gets very slow... I'm still a beginner, so I would be really glad if you could help me to improve it.但是,对于大表(数千行),代码可以工作,它变得非常慢......我仍然是初学者,所以如果你能帮助我改进它,我会很高兴。 Thanks =D谢谢=D
My preference would be for a dictionary-based approach:我更喜欢基于字典的方法:
import pandas as pd
d = df.set_index('Source')['Target']
r = {i for i in set(df['Source']).union(df['Target']) if 'R' in i}
{k: d.get(d.get(k)) for k in r if d.get(d.get(k))}
# {'R1': 'R2'}
You could merge the dataframe on the dateframe您可以合并日期框上的数据框
RtoC = df.merge(df,how='inner',left_on='Source',right_on='Target')\
.drop(['Target_y','Source_x'],axis=1)\
.rename(columns={'Target_x':'Target','Source_y':'Source'})
Then filter out compounds然后过滤掉化合物
RtoC[(RtoC.Target.str.contains('\d()')) & (RtoC.Source.str.contains('\d()'))]
Target Source
4 R2 R1
Or Convert to a dictionary, map the values and filter或转换为字典,映射值并过滤
mapper = dict(df.values[::-1])
df.Target = df.Target.map(mapper)
df[(df.Target.str.contains('\d()')) & (df.Source.str.contains('\d()'))]
Source Target
0 R1 R2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.