简体   繁体   English

dplyr 变异以创建新列并修改数据中的所有列

[英]dplyr mutate across to create a new column and modify all columns in the data

I have some data which looks like:我有一些数据看起来像:

   column1    column2 column3 column4 column5 column6 column7 column8  column9 column10
   <chr>      <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>    <chr>   <lgl>   
 1 company42  NA      NA      NA      NA      NA      NA      NA       NA      NA      
 2 company105 NA      €315k   NA      NA      NA      NA      Mar 2015 NA      NA      
 3 company23  NA      NA      NA      NA      NA      NA      NA       NA      NA      
 4 company70  NA      €570    NA      NA      NA      NA      Apr 2016 NA      NA

I want to do two things to it.我想做两件事。

  1. Extract all of the EUR signs and put them into a column "currency" - each row contains currency data where the currency is unique across the columns but it can change down the rows.提取所有 EUR 符号并将它们放入“货币”列中 - 每行包含货币数据,其中货币在各列中是唯一的,但它可以向下更改行。
  2. Convert all of the "K" to "000" and all of the "M" to "000000".将所有“K”转换为“000”,将所有“M”转换为“000000”。

Expected output:预期输出:

   column1    column2 column3 column4 column5 column6 column7 column8  column9 column10  column 11
   <chr>      <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>    <chr>   <lgl>       <chr>
 1 company42  NA      NA      NA      NA      NA      NA      NA       NA      NA            NA
 2 company105 NA      315000  NA      NA      NA      NA      Mar 2015 NA      NA            €
 3 company23  NA      NA      NA      NA      NA      NA      NA       NA      NA            NA
 4 company70  NA      570    NA      NA      NA      NA      Apr 2016 NA      NA             €

Where the new column 11 has been added and the € removed from column 3 and finally the "K" has been converted to "000".添加了新的第 11 列并从第 3 列中删除了 €,最后将“K”转换为“000”。

Data:数据:

data <- structure(list(column1 = c("company42", "company105", "company23", 
"company70", "company77", "company51", "company20", "company17", 
"company78", "company80", "company39", "company37", "company101", 
"company61", "company104", "company41", "company88", "company131", 
"company102", "company45"), column2 = c(NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, "\20060k", NA, NA, NA, NA, NA, NA, NA
), column3 = c(NA, "\200315k", NA, "\200570", NA, NA, NA, NA, 
NA, "$1.05M", NA, NA, "\200177k", NA, NA, NA, "\20070k", NA, 
NA, "\200223k"), column4 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), column5 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), column6 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "\200653k", NA, 
NA, NA, NA, NA, NA), column7 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), column8 = c(NA, 
"Mar 2015", NA, "Apr 2016", NA, NA, NA, NA, NA, "Sep 2012", NA, 
NA, NA, NA, NA, NA, "Jul 2014", NA, NA, "May 2016"), column9 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    column10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))

Try this approach, maybe is not the most optimal but it can address you to a new path when modifying it:试试这个方法,也许不是最理想的,但它可以在修改它时为您找到一条新路径:

library(tidyverse)
#Code 1
data2 <- data %>% mutate(across(column2:column7,~ifelse(grepl('k',.),gsub('k','000',.),
                                               ifelse(grepl('M',.),gsub('M','000000',.),.))))
#Code 2
data2$Currency <- apply(data2[,2:7],1,
                        function(x) trimws(gsub('NA','',
                                         paste0(gsub("[[:digit:]]", "", x),
                                                collapse = ',')),whitespace = ','))
data2$Currency <- gsub('\\.','',data2$Currency)
#Code 3
data3 <- data2 %>% mutate(across(column2:column7,~gsub("[[:punct:]]", "", .)))
data3 <- data3 %>% mutate(across(column2:column7,~gsub("€", "", .)))
data3 <- data3 %>% mutate(across(column2:column7,as.numeric))

Output:输出:

# A tibble: 20 x 11
   column1    column2   column3 column4 column5 column6 column7 column8  column9 column10 Currency
   <chr>        <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>    <chr>   <lgl>    <chr>   
 1 company42       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 2 company105      NA    315000      NA      NA      NA      NA Mar 2015 NA      NA       "€"     
 3 company23       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 4 company70       NA       570      NA      NA      NA      NA Apr 2016 NA      NA       "€"     
 5 company77       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 6 company51       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 7 company20       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 8 company17       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
 9 company78       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
10 company80       NA 105000000      NA      NA      NA      NA Sep 2012 NA      NA       "$"     
11 company39       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
12 company37       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
13 company101   60000    177000      NA      NA      NA      NA NA       NA      NA       "€,€"   
14 company61       NA        NA      NA      NA  653000      NA NA       NA      NA       "€"     
15 company104      NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
16 company41       NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
17 company88       NA     70000      NA      NA      NA      NA Jul 2014 NA      NA       "€"     
18 company131      NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
19 company102      NA        NA      NA      NA      NA      NA NA       NA      NA       ""      
20 company45       NA    223000      NA      NA      NA      NA May 2016 NA      NA       "€"     

Does this work:这是否有效:

> data %>% rowwise() %>% mutate(column11 = case_when(any(grepl('\u20AC', c_across(column2:column10))) ~ '\u20AC',
+                                                    any(grepl('\\$', c_across(column2:column10))) ~ '$', TRUE ~ NA_character_)) %>% 
+   mutate(across(2:10, ~ str_remove_all(., '\\$|\u20AC'))) %>% mutate(across(2:10, ~ case_when(grepl('k$',.) ~ parse_number(.)*1000,
+                                                                                               grepl('M$',.) ~ parse_number(.)*1000000,
+                                                                                               TRUE ~ NA_real_)))
# A tibble: 20 x 11
# Rowwise: 
   column1    column2 column3 column4 column5 column6 column7 column8 column9 column10 column11
   <chr>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <chr>   
 1 company42       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 2 company105      NA  315000      NA      NA      NA      NA      NA      NA       NA €       
 3 company23       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 4 company70       NA      NA      NA      NA      NA      NA      NA      NA       NA €       
 5 company77       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 6 company51       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 7 company20       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 8 company17       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
 9 company78       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
10 company80       NA 1050000      NA      NA      NA      NA      NA      NA       NA $       
11 company39       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
12 company37       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
13 company101   60000  177000      NA      NA      NA      NA      NA      NA       NA €       
14 company61       NA      NA      NA      NA  653000      NA      NA      NA       NA €       
15 company104      NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
16 company41       NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
17 company88       NA   70000      NA      NA      NA      NA      NA      NA       NA €       
18 company131      NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
19 company102      NA      NA      NA      NA      NA      NA      NA      NA       NA NA      
20 company45       NA  223000      NA      NA      NA      NA      NA      NA       NA €       
> 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 dplyr 在除某些列之外的所有列中进行变异 - Mutate across all but some columns using dplyr 使用dplyr mutate根据列名向量创建新列 - use dplyr mutate to create new columns based on a vector of column names 为什么在 dplyr 中 mutate + across 创建末尾带有“[,1]”的列? - Why mutate + across in dplyr create columns with "[,1]" at the end? 如何使用 dplyr 变异从将一列输入到返回列表的 function 中创建新列? - How to use dplyr mutate to create new columns from inputting one column into a function that returns a list? R dplyr使用自定义函数变异多列来创建新列 - R dplyr mutate multiple columns using custom function to create new column dplyr中的mutate_each:使用其他具有相同文本的列的平均行值创建新列 - mutate_each in dplyr: create new column with the mean row values of other columns with some text in common 如何使用 purrr 中的 map 和 dplyr::mutate 根据列对创建多个新列 - How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs R dataframe 使用cross / all_of / mutate_if从现有列创建多个新列 - R dataframe create mulitple new columns from existing columns using across / all_of / mutate_if 使用dplyr :: mutate将函数的参数化应用于单个数据框列,然后将结果保存到新列? - Using dplyr::mutate to apply parameterizations of a function to a single data frame column, then save the results to new columns? 根据 R 中数据框中所有其他列中的字符串值,使用 dplyr 创建一个新列 - Create a new column using dplyr based on string values in all other columns in a data frame in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM