简体   繁体   English

如何将其他列中的字符收集到一列中?

[英]how to gather character from other columns into one column?

I have a set of columns that have character object.我有一组具有字符 object 的列。 I want to be able to gather them in a single column.我希望能够将它们收集在一个列中。 Let's say dataset looks like this: The region is 16 but I showed 6 here.假设数据集如下所示:区域为 16,但我在这里显示了 6。

    country       regionarc   regionarb  regionarh  regionary  regionard  regionarw  
1     g              dome                               NA
2     g                                     ashi        NA
3     g                          gongo                  NA
4     g                                                 NA         salgi
5     g                                                                       forh

I want it to look like this:我希望它看起来像这样:

     country       regionarc   regionarb  regionarh  regionary  regionard  regionarw  district
1     g              dome                                 NA                            dome
2     g                                     ashi          NA                            ashi
3     g                          gongo                    NA                          gongo
4     g                                                   NA       salgi                salgi
5     g                                                   NA                  forh      forh

I think I may have to mutate and select the columns but I am not sure how to gather the districts in one columns.How can I gather the district inputs into one column?我想我可能不得不改变和 select 列,但我不知道如何将地区收集在一列中。如何将地区输入收集到一列中? Thank you in advance.先感谢您。

dput output输入 output

structure(list(date = c("08-Jun-20", "08-Jun-20", "09-Jun-20", 
"09-Jun-20"), Which.country.do.you.live.in. = c("G ", "G ", "G ", 
"G "), Must.be.in.Ghana.form.not.visible = c(NA, NA, NA, NA), 
    Which.region.do.you.live.in. = c(NA, NA, NA, NA), Which.district.do.you.live.in...Ahafo.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Ashanti.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Bono.Region. = c("Dormaa East", 
    "", "", ""), Which.district.do.you.live.in...Bono.East.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Central.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Eastern.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Greater.Accra.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Northern.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Northern.East.Region. = c("", 
    "East Mamprusi", "", ""), Which.district.do.you.live.in...Oti.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Savannah.Region. = c("", 
    "", "", "Central Gonja"), Which.district.do.you.live.in...Upper.East.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Upper.West.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Volta.Region. = c("", 
    "", "Ho Municipal", ""), Which.district.do.you.live.in...Western.Region. = c(NA, 
    NA, NA, NA), Which.district.do.you.live.in...Western.North.Region. = c(NA, 
    NA, NA, NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA
    ), X.2 = c(NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-4L))

If it is NA elements, we can use coalesce如果是NA元素,我们可以使用coalesce

library(dplyr)
df1 %>%
    mutate(district = coalesce(!!! select(., starts_with('region'))))
#   country regionarc regionarb regionarh regionary regionard regionarw district
#1       g      dome      <NA>      <NA>        NA      <NA>      <NA>     dome
#2       g      <NA>      <NA>      ashi        NA      <NA>      <NA>     ashi
#3       g      <NA>     gongo      <NA>        NA      <NA>      <NA>    gongo
#4       g      <NA>      <NA>      <NA>        NA     salgi      <NA>    salgi
#5       g      <NA>      <NA>      <NA>        NA      <NA>      forh     forh

Or with reduce/coalescereduce/coalesce

library(purrr)
df1 %>%
     mutate(district = select(., starts_with('region')) %>% 
                          reduce(coalesce))

Or if the columns have blank ( " ), we can convert to NA and then use coalesce或者如果列有空白 ( " ),我们可以转换为NA然后使用coalesce

df1 %>%
    transmute_at(vars(starts_with('region')), na_if, '') %>%
    transmute(district = coalesce(!!! .))
    bind_cols(df1, .)

Update更新

In the OP's dataset, we can select the column names that starts_with 'Which.district.do.you.live.in' , convert the blanks ( "" ) to NA ( na_if ) with mutate/across (from the dplyr new version) or with mutate_all and use that in coalesce在 OP 的数据集中,我们可以select以 ' starts_with 'Which.district.do.you.live.in'开头的列名,将空格 ( "" ) 转换为NA ( na_if ) 并使用mutate/across (来自 dplyr 新版本)或与mutate_all并在coalesce中使用它

df2 <- df2 %>%
     mutate(district =coalesce(!!! select(., starts_with('Which.district.do.you.live.in')) %>%
            mutate(across(everything(), na_if, ""))) )

df2$district
#[1] "Dormaa East"   "East Mamprusi" "Ho Municipal"  "Central Gonja"

Or with mutate_all或使用mutate_all

df2 %>%
     mutate(district =coalesce(!!! select(., starts_with('Which.district.do.you.live.in')) %>%
            mutate_all(na_if, ""))) 

Or in base R with pmin/pmax或者在带有pmin/pmaxbase R

df1$district <- do.call(pmax, c(df1[-1], na.rm = TRUE))
df1$district
#[1] "dome"  "ashi"  "gongo" "salgi" "forh" 

Or we can use max.col或者我们可以使用max.col

df1$district <- df1[-1][cbind(seq_len(nrow(df1)), max.col(!is.na(df1[-1]), 'first'))]
df1$district
#[1] "dome"  "ashi"  "gongo" "salgi" "forh" 

data数据

df1 <- structure(list(country = c("g", "g", "g", "g", "g"), regionarc = c("dome", 
NA, NA, NA, NA), regionarb = c(NA, NA, "gongo", NA, NA), regionarh = c(NA, 
"ashi", NA, NA, NA), regionary = c(NA, NA, NA, NA, NA), regionard = c(NA, 
NA, NA, "salgi", NA), regionarw = c(NA, NA, NA, NA, "forh")), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM