简体   繁体   English

基于 pandas 中另一个数据框的两列添加具有连接组的列

[英]add a column with connected groups based on two columns wthin another dataframe in pandas

I have a dataframe such as :我有一个数据框,例如:

tab1选项卡1

Group1 Group2 
G1     G2
G4     G3
G5     G3 

tab2选项卡2

Names           Groups
Canis_lupus     G1     
Cattus_cattus   G1
Mus_musculus    G1
Danio_rerio     G2
Betta_splendens G2
Griseus_gris    G3
Buffallo_kol    G3 
Homo_sapiens    G4
Macaque_ser     G4
Wistiti_del     G5 
Apis_mellifera  G6 

And I would like to add a new Connected_groups column to the tab2 where I put all connect groups within the tab1我想在tab2中添加一个新的Connected_groups列,我将所有连接组放在tab1

I should then get :然后我应该得到:

Names           Groups   Connected_groups 
Canis_lupus     G1       G1-G2
Cattus_cattus   G1       G1-G2
Mus_musculus    G1       G1-G2
Danio_rerio     G2       G1-G2
Betta_splendens G2       G1-G2
Griseus_gris    G3       G3-G4-G5
Buffallo_kol    G3       G3-G4-G5
Homo_sapiens    G4       G3-G4-G5
Macaque_ser     G4       G3-G4-G5
Wistiti_del     G5       G3-G4-G5
Apis_mellifera  G6       G6 

Here are the dic format of the df if it can helps ;如果有帮助,这里是 df 的 dic 格式;

tab1 = pd.DataFrame.from_dict({'Group1': {0: 'G1', 1: 'G4', 2: 'G5'}, 'Group2': {0: 'G2', 1: 'G3', 2: 'G3'}})

tab2=pd.DataFrame.from_dict({'Names': {0: 'Canis_lupus', 1: 'Cattus_cattus', 2: 'Mus_musculus', 3: 'Danio_rerio', 4: 'Betta_splendens', 5: 'Griseus_gris', 6: 'Buffallo_kol', 7: 'Homo_sapiens', 8: 'Macaque_ser', 9: 'Wistiti_del', 10: 'Apis_mellifera'}, 'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G2', 4: 'G2', 5: 'G3', 6: 'G3', 7: 'G4', 8: 'G4', 9: 'G5', 10: 'G6'}})

Let us try nextworkx to find connected groups in tab1 , then create a mapping dictionary of connected groups and use it with replace to substitute the values in tab2让我们尝试nextworkxtab1中查找连接组,然后创建连接组的映射字典并使用它来replace tab2中的值

import networkx as nx

G = nx.from_pandas_edgelist(tab1, 'Group1', 'Group2')
d = {k: '-'.join(c) for c in nx.connected_components(G) for k in c}

tab2['conn-grps'] = tab2['Groups'].replace(d)

              Names Groups conn-grps
0       Canis_lupus     G1     G2-G1
1     Cattus_cattus     G1     G2-G1
2      Mus_musculus     G1     G2-G1
3       Danio_rerio     G2     G2-G1
4   Betta_splendens     G2     G2-G1
5      Griseus_gris     G3  G3-G5-G4
6      Buffallo_kol     G3  G3-G5-G4
7      Homo_sapiens     G4  G3-G5-G4
8       Macaque_ser     G4  G3-G5-G4
9       Wistiti_del     G5  G3-G5-G4
10   Apis_mellifera     G6        G6

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于另一个 dataframe 中的组在 pandas dataframe 中创建指标列? - How to create indicator columns in a pandas dataframe based on groups in another dataframe? 基于另一列的子组向 Pandas dataframe 添加新列的有效方法 - An efficient way to add a new column to Pandas dataframe based on sub-groups of another column Python Pandas - Dataframe - 根据另一列添加列,该列具有来自另外两列的数学运算 - Python Pandas - Dataframe - Add column depending on another column, which has a mathematical operation from another two columns 如何使用 pandas dataframe 将列添加到 dataframe 根据另一个 df 中的匹配列将数据标记为 1 或 0 - How to use pandas dataframe to add a column to a dataframe that labels data as 1 or 0 based on matching columns in another df 将列添加到数据框熊猫上的组 - add column to groups on dataframe pandas 在 pandas dataframe 中添加一列,这是基于其他列条件的另一列的平均值 - Add a column in a pandas dataframe that is the average of another column based on conditions of other columns Pandas 根据另一个数据框中的匹配列填充新的数据框列 - Pandas populate new dataframe column based on matching columns in another dataframe 基于另一个DataFrame中的两列对pandas DataFrame进行子集 - Subset pandas DataFrame based on two columns in another DataFrame 根据两列的值添加另一列 - Add another column based on the value of two columns 根据另一个数据帧将列添加到 Pandas 数据帧并将值设置为零 - Add columns to Pandas dataframe based on another dataframe and set values to zero
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM