How can I index values in a column in pandas and make it into a new column? This is what I'm trying to do:
Original:
Data
0 0010-AAAA
1 0010-BBBB
2 0010-CCCC
3 0011-DDDD
4 0011-EEEE
Adding two columns:
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010-BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011-DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
Looks like you need a split
:
df[['col_2', 'col_3']] = df['Data'].str.split('-', n=1, expand=True)
output:
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010-BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011-DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
Then use a regex with str.extract
.
In this case: numbers \d+
, followed by non numbers \D+
:
df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d+)(\D+)')
output:
Data col_2 col_3
0 0010AAAA 0010 AAAA
1 0010BBBB 0010 BBBB
2 0010CCCC 0010 CCCC
3 0011DDDD 0011 DDDD
4 0011EEEE 0011 EEEE
or even: r'(\d+)\W*(\D+)'
(digits / optional non-alphanum / non-digits) to handle both cases at once:
df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d+)\W*(\D+)')
example:
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
df[['col_2','col_3']]= df['Data'].str.split("-",expand=True)
df
Data col_2 col_3
0 0010-AAAA 0010 AAAA
1 0010-BBBB 0010 BBBB
2 0010-CCCC 0010 CCCC
3 0011-DDDD 0011 DDDD
4 0011-EEEE 0011 EEEE
All the python string methods will work, including slicing
df["Data"].str[:4]
0 0010
1 0010
2 0010
3 0011
4 0011
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.