I have a datafarme like following
data = pd.DataFrame({'Name': ['CTA15;CTA16;CAR;', 'AC007;AC008;GOO7;G008;F009', 'AC09;BC09;C09;V09;B0P', 'UF09;GF09;HF09;MN08'],
'Sample':['JAK_1', 'TOR2', 'Gilo', 'ALR']})
data
Name Sample
0 CTA15;CTA16;CAR; JAK_1
1 AC007;AC008;GOO7;G008;F009 TOR2
2 AC09;BC09;C09;V09;B0P Gilo
3 UF09;GF09;HF09;MN08 ALR
And I need to split the column Name and into a new row and add the value in column Sample correspondingly.In the end, I am aiming to have a data frame like this,
Name Sample
0 CTA15 JAK_1
1 CTA16 JAK_1
2 CAR JAK_1
3 AC007 TOR2
4 AC008 TOR2
5 GOO7 TOR2
6 G008 TOR2
7 F009 TOR2
8 AC09 Gilo
9 BC09 Gilo
10 C09 Gilo
11 V09 Gilo
12 B0P Gilo
13 UF09 ALR
14 GF09 ALR
15 HF09 ALR
16 MN08 ALR
I need to split with ';' into the new line and add the value from Sample column to each correspondingly.
You can use str.strip
for remove ;
in start of end of some strings, str.split
for list
s, then get len
for length
of them.
Last create new DataFrame
by constructor
with numpy.repeat
and numpy.concatenate
:
vals = data['Name'].str.strip(';').str.split(';')
l = vals.str.len()
df = pd.DataFrame({'Sample':np.repeat(data['Sample'].values, l),
'Name':np.concatenate(vals.values)})
print (df)
Name Sample
0 CTA15 JAK_1
1 CTA16 JAK_1
2 CAR JAK_1
3 AC007 TOR2
4 AC008 TOR2
5 GOO7 TOR2
6 G008 TOR2
7 F009 TOR2
8 AC09 Gilo
9 BC09 Gilo
10 C09 Gilo
11 V09 Gilo
12 B0P Gilo
13 UF09 ALR
14 GF09 ALR
15 HF09 ALR
16 MN08 ALR
Alternative solution:
df = data.join(data.pop('Name')
.str.strip(';')
.str.split(';', expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('Name')).reset_index(drop=True)
print (df)
Sample Name
0 JAK_1 CTA15
1 JAK_1 CTA16
2 JAK_1 CAR
3 TOR2 AC007
4 TOR2 AC008
5 TOR2 GOO7
6 TOR2 G008
7 TOR2 F009
8 Gilo AC09
9 Gilo BC09
10 Gilo C09
11 Gilo V09
12 Gilo B0P
13 ALR UF09
14 ALR GF09
15 ALR HF09
16 ALR MN08
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.