简体   繁体   English

R 在空白处拆分列并替换数据

[英]R Split column on white space and replace data

I have a dataframe that looks like this:我有一个看起来像这样的 dataframe:

Data数据

structure(list(col1 = c("Arizona", "Florida", "Montreal"), col2 = c("5½ -130", 
"5 -135", "5½ -125")), row.names = c(NA, -3L), class = "data.frame")
Col1     Col2
Arizona  5½ -130
Florida  5 -135
Montreal 5½ -125

I need it to look like this (splitting Col2 into two columns based on " " AND replacing ½ with.5)我需要它看起来像这样(根据“”将 Col2 分成两列并将 ½ 替换为.5)

Col1      Col2     Col3
Arizona   5.5       -130
Florida   5         -135
Montreal  5.5       -125

https://tidyr.tidyverse.org/reference/separate.html https://tidyr.tidyverse.org/reference/separate.html

    libarary(tidyverse)
    df <- data.frame( 
              Team = c("Arizona","Florida","Montreal"),     
              Col2 = c("5½ -130", "5 -135", "5½ -125")
    )

 new_df <-  separate(df, 2, into=c("odds", "payout100"), sep= " ")


   new_df
          Team  odds    payout100
    1  Arizona   5½      -130
    2  Florida    5      -135
    3 Montreal   5½      -125

   new_df$odds <- as.numeric(str_replace_all(new_df$odds, "5[^ ]", "5.5"))

    new_df

         Team   odds     payout100
    1  Arizona  5.5      -130
    2  Florida  5.0      -135
    3 Montreal  5.5      -125

The call to stringr::str_replace_all changes a 5 followed by anything other than a space to 5.5.对 stringr::str_replace_all 的调用将 5 后跟除空格以外的任何内容更改为 5.5。 This assumes that the only character to follow the integers will be the 1/2 which is an unusual character (not a digit or a letter).这假设整数后面的唯一字符是 1/2,这是一个不寻常的字符(不是数字或字母)。

The call通话

         str_replace_all(new_df$odds, "([0-9]+)[^ ]", "\\1.5")

makes the same change for all numbers that might start the odds.对所有可能开始赔率的数字进行相同的更改。 Note stringr is loaded automatically when you load the tidyverse package.请注意,加载 tidyverse package 时会自动加载 stringr。

Also note that I made up the new column names.另请注意,我编造了新的列名。 It looks like something do with betting on the NHL playoffs so I based my column names on that.看起来与投注 NHL 季后赛有关,所以我的专栏名称以此为基础。 Otherwise the arguments I used are the data frame object, the position of column being split (2), and what charcater(s) is/are used to separate the column.否则,我使用的 arguments 是数据框 object,正在拆分的列的 position (2),以及用于分隔列的字符。 In this case, a blank space.在这种情况下,一个空格。

using data from @Tho Vu:使用来自@Tho Vu 的数据:

> df %>% 
    separate( col2, into=c("Col2", "Col3"), sep= " ") %>% 
    mutate(Col2 = gsub("½", ".05", Col2) %>% as.numeric)
      col1 Col2 Col3
1  Arizona 5.05 -130
2  Florida 5.00 -135
3 Montreal 5.05 -125

You can try this approach你可以试试这个方法

library(tidyverse)
library(stringr)
df <- structure(list(col1 = c("Arizona", "Florida", "Montreal"), 
                     col2 = c("5½ -130", "5 -135", "5½ -125")), row.names = c(NA, -3L), class = "data.frame")

df2 <- df %>% 
  mutate(col3 = str_extract_all(col2, regex("(-.*)")), 
     col2 = str_replace_all(col2, regex("(-.*)"), ""),
     col2 = str_replace_all(col2, regex("½"), ".5"))

df2$col2 <- as.numeric(df2$col2)

#       col1 col2 col3
# 1  Arizona 5.5  -130
# 2  Florida 5.00 -135
# 3 Montreal 5.5  -125

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM