繁体   English   中英

dplyr 变异以创建新列并修改数据中的所有列

[英]dplyr mutate across to create a new column and modify all columns in the data

我有一些数据看起来像:

   column1    column2 column3 column4 column5 column6 column7 column8  column9 column10
   <chr>      <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>    <chr>   <lgl>   
 1 company42  NA      NA      NA      NA      NA      NA      NA       NA      NA      
 2 company105 NA      €315k   NA      NA      NA      NA      Mar 2015 NA      NA      
 3 company23  NA      NA      NA      NA      NA      NA      NA       NA      NA      
 4 company70  NA      €570    NA      NA      NA      NA      Apr 2016 NA      NA

我想做两件事。

  1. 提取所有 EUR 符号并将它们放入“货币”列中 - 每行包含货币数据,其中货币在各列中是唯一的,但它可以向下更改行。
  2. 将所有“K”转换为“000”,将所有“M”转换为“000000”。

预期输出:

   column1    column2 column3 column4 column5 column6 column7 column8  column9 column10  column 11
   <chr>      <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>    <chr>   <lgl>       <chr>
 1 company42  NA      NA      NA      NA      NA      NA      NA       NA      NA            NA
 2 company105 NA      315000  NA      NA      NA      NA      Mar 2015 NA      NA            €
 3 company23  NA      NA      NA      NA      NA      NA      NA       NA      NA            NA
 4 company70  NA      570    NA      NA      NA      NA      Apr 2016 NA      NA             €

添加了新的第 11 列并从第 3 列中删除了 €,最后将“K”转换为“000”。

数据:

data <- structure(list(column1 = c("company42", "company105", "company23", 
"company70", "company77", "company51", "company20", "company17", 
"company78", "company80", "company39", "company37", "company101", 
"company61", "company104", "company41", "company88", "company131", 
"company102", "company45"), column2 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, "\20060k", NA, NA, NA, NA, NA, NA, NA
), column3 = c(NA, "\200315k", NA, "\200570", NA, NA, NA, NA, 
NA, "$1.05M", NA, NA, "\200177k", NA, NA, NA, "\20070k", NA, 
NA, "\200223k"), column4 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), column5 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), column6 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "\200653k", NA, 
NA, NA, NA, NA, NA), column7 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), column8 = c(NA, 
"Mar 2015", NA, "Apr 2016", NA, NA, NA, NA, NA, "Sep 2012", NA, 
NA, NA, NA, NA, NA, "Jul 2014", NA, NA, "May 2016"), column9 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    column10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))

试试这个方法,也许不是最理想的,但它可以在修改它时为您找到一条新路径:

library(tidyverse)
#Code 1
data2 <- data %>% mutate(across(column2:column7,~ifelse(grepl('k',.),gsub('k','000',.),
                                               ifelse(grepl('M',.),gsub('M','000000',.),.))))
#Code 2
data2$Currency <- apply(data2[,2:7],1,
                        function(x) trimws(gsub('NA','',
                                         paste0(gsub("[[:digit:]]", "", x),
                                                collapse = ',')),whitespace = ','))
data2$Currency <- gsub('\\.','',data2$Currency)
#Code 3
data3 <- data2 %>% mutate(across(column2:column7,~gsub("[[:punct:]]", "", .)))
data3 <- data3 %>% mutate(across(column2:column7,~gsub("€", "", .)))
data3 <- data3 %>% mutate(across(column2:column7,as.numeric))

输出:

# A tibble: 20 x 11
   column1    column2   column3 column4 column5 column6 column7 column8  column9 column10 Currency
   <chr>        <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    <chr>   <lgl>    <chr>   
 1 company42       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 2 company105      NA    315000      NA      NA      NA      NA Mar 2015 NA      NA       "€"     
 3 company23       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 4 company70       NA       570      NA      NA      NA      NA Apr 2016 NA      NA       "€"     
 5 company77       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 6 company51       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 7 company20       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 8 company17       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 9 company78       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
10 company80       NA 105000000      NA      NA      NA      NA Sep 2012 NA      NA       "$"     
11 company39       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
12 company37       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
13 company101   60000    177000      NA      NA      NA      NA NA       NA      NA       "€,€"   
14 company61       NA        NA      NA      NA  653000      NA NA       NA      NA       "€"     
15 company104      NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
16 company41       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
17 company88       NA     70000      NA      NA      NA      NA Jul 2014 NA      NA       "€"     
18 company131      NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
19 company102      NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
20 company45       NA    223000      NA      NA      NA      NA May 2016 NA      NA       "€"     

这是否有效:

> data %>% rowwise() %>% mutate(column11 = case_when(any(grepl('\u20AC', c_across(column2:column10))) ~ '\u20AC',
+                                                    any(grepl('\\$', c_across(column2:column10))) ~ '$', TRUE ~ NA_character_)) %>% 
+   mutate(across(2:10, ~ str_remove_all(., '\\$|\u20AC'))) %>% mutate(across(2:10, ~ case_when(grepl('k$',.) ~ parse_number(.)*1000,
+                                                                                               grepl('M$',.) ~ parse_number(.)*1000000,
+                                                                                               TRUE ~ NA_real_)))
# A tibble: 20 x 11
# Rowwise: 
   column1    column2 column3 column4 column5 column6 column7 column8 column9 column10 column11
   <chr>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <chr>   
 1 company42       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 2 company105      NA  315000      NA      NA      NA      NA      NA      NA       NA €       
 3 company23       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 4 company70       NA      NA      NA      NA      NA      NA      NA      NA       NA €       
 5 company77       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 6 company51       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 7 company20       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 8 company17       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 9 company78       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
10 company80       NA 1050000      NA      NA      NA      NA      NA      NA       NA $       
11 company39       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
12 company37       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
13 company101   60000  177000      NA      NA      NA      NA      NA      NA       NA €       
14 company61       NA      NA      NA      NA  653000      NA      NA      NA       NA €       
15 company104      NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
16 company41       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
17 company88       NA   70000      NA      NA      NA      NA      NA      NA       NA €       
18 company131      NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
19 company102      NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
20 company45       NA  223000      NA      NA      NA      NA      NA      NA       NA €       
> 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM