I have a column named BREADS
with 5 rows, I want to split the column and values into 4 columns namely B
, REA
, D
and S
.
BREADS
>2319-22-<21
>1513-16-<19
>1319-25-<22
>1617-21-<25
>1011-15-<17
Desired outcome
B, REA , D, S ### column names
>23 , 19-22 , - , <21
>15 , 13-16 , - , <19
>13 , 19-25 , - , <22
>16 , 17-21 , - , <25
>10 , 11-15 , - , <17
# Key: > greater than and < less than, - hyphen in the column 'D'
My attempt
###### in python
# for column 'B'
df['B'] = df['BREADS'].astype(str).str[0:4] # returns '>23','>15',.....,'>10'
#### in R
library(stringr)
str_split_fixed(df$BREADS, "", 2)
An option with extract
from tidyr
in R
library(dplyr)
library(tidyr)
df1 %>%
extract(BREADS, into = c('B', 'REA', 'D', 'S'),
'^(\\>..)(\\d{2}-\\d{2})(-)(.*)')
-output
# B REA D S
#1 >23 19-22 - <21
#2 >15 13-16 - <19
#3 >13 19-25 - <22
#4 >16 17-21 - <25
#5 >10 11-15 - <17
df1 <- structure(list(BREADS = c(">2319-22-<21", ">1513-16-<19", ">1319-25-<22",
">1617-21-<25", ">1011-15-<17")), class = "data.frame", row.names = c(NA,
-5L))
For Python:
d={'B': (0,4), 'REA':(3,8), 'D':(8,9), 'S':(9:20)}
for i in d:
df[i]=df['BREADS'].apply(lambda x: x[d[i][0]:d[i][1])
You can use pandas str.extract to pull the data into separate columns; the assumption here is that the data is uniform for each row:
pattern = r"(?P<B>>.{2})(?P<REA>.{2}-.{2})(?P<D>-)(?P<S><.{2})"
df.BREADS.str.extract(pattern)
B REA D S
0 >23 19-22 - <21
1 >15 13-16 - <19
2 >13 19-25 - <22
3 >16 17-21 - <25
4 >10 11-15 - <17
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.