Hel,lo I have a dataframe such as:
Groups Names COLs COLe
G1 ABC_DEF.1:2-300():Canis_lupus 2 300
G1 SDDD1 NA NA
G1 SKUD.2. NA NA
G1 SEQUENCE3 NA NA
G1 ABC_DEF.1:400-600():Canis_lupus 400 600
G1 IJK_LMN.1:20-200():Bos_taurus 20 200
G2 OP_D:500-1000():Felis_catus 500 1000
G2 JDJDJ99 NA NA
and I would like to add a new column Names2
and put within groups the all Names
without ()
in it content against every Names
with ()
in its content:
The output would be:
Groups Names Names2 COLs COLe
G1 ABC_DEF.1:2-300():Canis_lupus SDDD1 2 300
G1 ABC_DEF.1:2-300():Canis_lupus SKUD.2. 2 300
G1 ABC_DEF.1:2-300():Canis_lupus SEQUENCE3 2 300
G1 ABC_DEF.1:400-600():Canis_lupus SDDD1 400 600
G1 ABC_DEF.1:400-600():Canis_lupus SKUD.2. 400 600
G1 ABC_DEF.1:400-600():Canis_lupus SEQUENCE3 400 600
G1 IJK_LMN.1:20-200():Bos_taurus SDDD1 20 200
G1 IJK_LMN.1:20-200():Bos_taurus SKUD.2. 20 200
G1 IJK_LMN.1:20-200():Bos_taurus SEQUENCE3 20 200
G2 OP_D:500-1000():Felis_catus JDJDJ99 500 1000
Does someone have an idea using pandas?
df1 = df[df.Names.str.contains('()', regex=False)]
df2 = df[~df.Names.str.contains('()', regex=False)][['Groups', 'Names']]
print( pd.merge(left=df1, right=df2, on='Groups').rename(columns={"Names_x": "Names", "Names_y": "Names2"}) )
Prints:
Groups Names COLs COLe Names2
0 G1 ABC_DEF.1:2-300():Canis_lupus 2.0 300.0 SDDD1
1 G1 ABC_DEF.1:2-300():Canis_lupus 2.0 300.0 SKUD.2.
2 G1 ABC_DEF.1:2-300():Canis_lupus 2.0 300.0 SEQUENCE3
3 G1 ABC_DEF.1:400-600():Canis_lupus 400.0 600.0 SDDD1
4 G1 ABC_DEF.1:400-600():Canis_lupus 400.0 600.0 SKUD.2.
5 G1 ABC_DEF.1:400-600():Canis_lupus 400.0 600.0 SEQUENCE3
6 G1 IJK_LMN.1:20-200():Bos_taurus 20.0 200.0 SDDD1
7 G1 IJK_LMN.1:20-200():Bos_taurus 20.0 200.0 SKUD.2.
8 G1 IJK_LMN.1:20-200():Bos_taurus 20.0 200.0 SEQUENCE3
9 G2 OP_D:500-1000():Felis_catus 500.0 1000.0 JDJDJ99
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.