[英]R Split column on white space and replace data
I have a dataframe that looks like this:我有一个看起来像这样的 dataframe:
Data数据
structure(list(col1 = c("Arizona", "Florida", "Montreal"), col2 = c("5½ -130",
"5 -135", "5½ -125")), row.names = c(NA, -3L), class = "data.frame")
Col1 Col2
Arizona 5½ -130
Florida 5 -135
Montreal 5½ -125
I need it to look like this (splitting Col2 into two columns based on " " AND replacing ½ with.5)我需要它看起来像这样(根据“”将 Col2 分成两列并将 ½ 替换为.5)
Col1 Col2 Col3
Arizona 5.5 -130
Florida 5 -135
Montreal 5.5 -125
https://tidyr.tidyverse.org/reference/separate.html https://tidyr.tidyverse.org/reference/separate.html
libarary(tidyverse)
df <- data.frame(
Team = c("Arizona","Florida","Montreal"),
Col2 = c("5½ -130", "5 -135", "5½ -125")
)
new_df <- separate(df, 2, into=c("odds", "payout100"), sep= " ")
new_df
Team odds payout100
1 Arizona 5½ -130
2 Florida 5 -135
3 Montreal 5½ -125
new_df$odds <- as.numeric(str_replace_all(new_df$odds, "5[^ ]", "5.5"))
new_df
Team odds payout100
1 Arizona 5.5 -130
2 Florida 5.0 -135
3 Montreal 5.5 -125
The call to stringr::str_replace_all changes a 5 followed by anything other than a space to 5.5.对 stringr::str_replace_all 的调用将 5 后跟除空格以外的任何内容更改为 5.5。 This assumes that the only character to follow the integers will be the 1/2 which is an unusual character (not a digit or a letter).
这假设整数后面的唯一字符是 1/2,这是一个不寻常的字符(不是数字或字母)。
The call通话
str_replace_all(new_df$odds, "([0-9]+)[^ ]", "\\1.5")
makes the same change for all numbers that might start the odds.对所有可能开始赔率的数字进行相同的更改。 Note stringr is loaded automatically when you load the tidyverse package.
请注意,加载 tidyverse package 时会自动加载 stringr。
Also note that I made up the new column names.另请注意,我编造了新的列名。 It looks like something do with betting on the NHL playoffs so I based my column names on that.
看起来与投注 NHL 季后赛有关,所以我的专栏名称以此为基础。 Otherwise the arguments I used are the data frame object, the position of column being split (2), and what charcater(s) is/are used to separate the column.
否则,我使用的 arguments 是数据框 object,正在拆分的列的 position (2),以及用于分隔列的字符。 In this case, a blank space.
在这种情况下,一个空格。
using data from @Tho Vu:使用来自@Tho Vu 的数据:
> df %>%
separate( col2, into=c("Col2", "Col3"), sep= " ") %>%
mutate(Col2 = gsub("½", ".05", Col2) %>% as.numeric)
col1 Col2 Col3
1 Arizona 5.05 -130
2 Florida 5.00 -135
3 Montreal 5.05 -125
You can try this approach你可以试试这个方法
library(tidyverse)
library(stringr)
df <- structure(list(col1 = c("Arizona", "Florida", "Montreal"),
col2 = c("5½ -130", "5 -135", "5½ -125")), row.names = c(NA, -3L), class = "data.frame")
df2 <- df %>%
mutate(col3 = str_extract_all(col2, regex("(-.*)")),
col2 = str_replace_all(col2, regex("(-.*)"), ""),
col2 = str_replace_all(col2, regex("½"), ".5"))
df2$col2 <- as.numeric(df2$col2)
# col1 col2 col3
# 1 Arizona 5.5 -130
# 2 Florida 5.00 -135
# 3 Montreal 5.5 -125
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.