简体   繁体   中英

Splitting the row into new line and adding corresponding matching value from other column

I have a datafarme like following

  data = pd.DataFrame({'Name': ['CTA15;CTA16;CAR;', 'AC007;AC008;GOO7;G008;F009', 'AC09;BC09;C09;V09;B0P', 'UF09;GF09;HF09;MN08'],  
    'Sample':['JAK_1', 'TOR2', 'Gilo', 'ALR']})
data

                Name    Sample
0   CTA15;CTA16;CAR;           JAK_1
1   AC007;AC008;GOO7;G008;F009  TOR2
2   AC09;BC09;C09;V09;B0P       Gilo
3   UF09;GF09;HF09;MN08         ALR

And I need to split the column Name and into a new row and add the value in column Sample correspondingly.In the end, I am aiming to have a data frame like this,

    Name    Sample
0   CTA15   JAK_1
1   CTA16   JAK_1
2   CAR JAK_1
3   AC007   TOR2
4   AC008   TOR2
5   GOO7    TOR2
6   G008    TOR2
7   F009    TOR2
8   AC09    Gilo
9   BC09    Gilo
10  C09 Gilo
11  V09 Gilo
12  B0P Gilo
13  UF09    ALR
14  GF09    ALR
15  HF09    ALR
16  MN08    ALR

I need to split with ';' into the new line and add the value from Sample column to each correspondingly.

You can use str.strip for remove ; in start of end of some strings, str.split for list s, then get len for length of them.

Last create new DataFrame by constructor with numpy.repeat and numpy.concatenate :

vals = data['Name'].str.strip(';').str.split(';')
l = vals.str.len()
df = pd.DataFrame({'Sample':np.repeat(data['Sample'].values, l), 
                   'Name':np.concatenate(vals.values)})
print (df)
     Name Sample
0   CTA15  JAK_1
1   CTA16  JAK_1
2     CAR  JAK_1
3   AC007   TOR2
4   AC008   TOR2
5    GOO7   TOR2
6    G008   TOR2
7    F009   TOR2
8    AC09   Gilo
9    BC09   Gilo
10    C09   Gilo
11    V09   Gilo
12    B0P   Gilo
13   UF09    ALR
14   GF09    ALR
15   HF09    ALR
16   MN08    ALR

Alternative solution:

df = data.join(data.pop('Name')
                   .str.strip(';')
                   .str.split(';', expand=True)
                   .stack()
                   .reset_index(level=1, drop=True)
                   .rename('Name')).reset_index(drop=True)
print (df)
   Sample   Name
0   JAK_1  CTA15
1   JAK_1  CTA16
2   JAK_1    CAR
3    TOR2  AC007
4    TOR2  AC008
5    TOR2   GOO7
6    TOR2   G008
7    TOR2   F009
8    Gilo   AC09
9    Gilo   BC09
10   Gilo    C09
11   Gilo    V09
12   Gilo    B0P
13    ALR   UF09
14    ALR   GF09
15    ALR   HF09
16    ALR   MN08

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM