[英]dplyr mutate across to create a new column and modify all columns in the data
我有一些数据看起来像:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
1 company42 NA NA NA NA NA NA NA NA NA
2 company105 NA €315k NA NA NA NA Mar 2015 NA NA
3 company23 NA NA NA NA NA NA NA NA NA
4 company70 NA €570 NA NA NA NA Apr 2016 NA NA
我想做两件事。
预期输出:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column 11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr>
1 company42 NA NA NA NA NA NA NA NA NA NA
2 company105 NA 315000 NA NA NA NA Mar 2015 NA NA €
3 company23 NA NA NA NA NA NA NA NA NA NA
4 company70 NA 570 NA NA NA NA Apr 2016 NA NA €
添加了新的第 11 列并从第 3 列中删除了 €,最后将“K”转换为“000”。
数据:
data <- structure(list(column1 = c("company42", "company105", "company23",
"company70", "company77", "company51", "company20", "company17",
"company78", "company80", "company39", "company37", "company101",
"company61", "company104", "company41", "company88", "company131",
"company102", "company45"), column2 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, "\20060k", NA, NA, NA, NA, NA, NA, NA
), column3 = c(NA, "\200315k", NA, "\200570", NA, NA, NA, NA,
NA, "$1.05M", NA, NA, "\200177k", NA, NA, NA, "\20070k", NA,
NA, "\200223k"), column4 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), column5 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), column6 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "\200653k", NA,
NA, NA, NA, NA, NA), column7 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), column8 = c(NA,
"Mar 2015", NA, "Apr 2016", NA, NA, NA, NA, NA, "Sep 2012", NA,
NA, NA, NA, NA, NA, "Jul 2014", NA, NA, "May 2016"), column9 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
column10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))
试试这个方法,也许不是最理想的,但它可以在修改它时为您找到一条新路径:
library(tidyverse)
#Code 1
data2 <- data %>% mutate(across(column2:column7,~ifelse(grepl('k',.),gsub('k','000',.),
ifelse(grepl('M',.),gsub('M','000000',.),.))))
#Code 2
data2$Currency <- apply(data2[,2:7],1,
function(x) trimws(gsub('NA','',
paste0(gsub("[[:digit:]]", "", x),
collapse = ',')),whitespace = ','))
data2$Currency <- gsub('\\.','',data2$Currency)
#Code 3
data3 <- data2 %>% mutate(across(column2:column7,~gsub("[[:punct:]]", "", .)))
data3 <- data3 %>% mutate(across(column2:column7,~gsub("€", "", .)))
data3 <- data3 %>% mutate(across(column2:column7,as.numeric))
输出:
# A tibble: 20 x 11
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 Currency
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <lgl> <chr>
1 company42 NA NA NA NA NA NA NA NA NA ""
2 company105 NA 315000 NA NA NA NA Mar 2015 NA NA "€"
3 company23 NA NA NA NA NA NA NA NA NA ""
4 company70 NA 570 NA NA NA NA Apr 2016 NA NA "€"
5 company77 NA NA NA NA NA NA NA NA NA ""
6 company51 NA NA NA NA NA NA NA NA NA ""
7 company20 NA NA NA NA NA NA NA NA NA ""
8 company17 NA NA NA NA NA NA NA NA NA ""
9 company78 NA NA NA NA NA NA NA NA NA ""
10 company80 NA 105000000 NA NA NA NA Sep 2012 NA NA "$"
11 company39 NA NA NA NA NA NA NA NA NA ""
12 company37 NA NA NA NA NA NA NA NA NA ""
13 company101 60000 177000 NA NA NA NA NA NA NA "€,€"
14 company61 NA NA NA NA 653000 NA NA NA NA "€"
15 company104 NA NA NA NA NA NA NA NA NA ""
16 company41 NA NA NA NA NA NA NA NA NA ""
17 company88 NA 70000 NA NA NA NA Jul 2014 NA NA "€"
18 company131 NA NA NA NA NA NA NA NA NA ""
19 company102 NA NA NA NA NA NA NA NA NA ""
20 company45 NA 223000 NA NA NA NA May 2016 NA NA "€"
这是否有效:
> data %>% rowwise() %>% mutate(column11 = case_when(any(grepl('\u20AC', c_across(column2:column10))) ~ '\u20AC',
+ any(grepl('\\$', c_across(column2:column10))) ~ '$', TRUE ~ NA_character_)) %>%
+ mutate(across(2:10, ~ str_remove_all(., '\\$|\u20AC'))) %>% mutate(across(2:10, ~ case_when(grepl('k$',.) ~ parse_number(.)*1000,
+ grepl('M$',.) ~ parse_number(.)*1000000,
+ TRUE ~ NA_real_)))
# A tibble: 20 x 11
# Rowwise:
column1 column2 column3 column4 column5 column6 column7 column8 column9 column10 column11
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 company42 NA NA NA NA NA NA NA NA NA NA
2 company105 NA 315000 NA NA NA NA NA NA NA €
3 company23 NA NA NA NA NA NA NA NA NA NA
4 company70 NA NA NA NA NA NA NA NA NA €
5 company77 NA NA NA NA NA NA NA NA NA NA
6 company51 NA NA NA NA NA NA NA NA NA NA
7 company20 NA NA NA NA NA NA NA NA NA NA
8 company17 NA NA NA NA NA NA NA NA NA NA
9 company78 NA NA NA NA NA NA NA NA NA NA
10 company80 NA 1050000 NA NA NA NA NA NA NA $
11 company39 NA NA NA NA NA NA NA NA NA NA
12 company37 NA NA NA NA NA NA NA NA NA NA
13 company101 60000 177000 NA NA NA NA NA NA NA €
14 company61 NA NA NA NA 653000 NA NA NA NA €
15 company104 NA NA NA NA NA NA NA NA NA NA
16 company41 NA NA NA NA NA NA NA NA NA NA
17 company88 NA 70000 NA NA NA NA NA NA NA €
18 company131 NA NA NA NA NA NA NA NA NA NA
19 company102 NA NA NA NA NA NA NA NA NA NA
20 company45 NA 223000 NA NA NA NA NA NA NA €
>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.