[英]how to gather character from other columns into one column?
I have a set of columns that have character object.我有一组具有字符 object 的列。 I want to be able to gather them in a single column.
我希望能够将它们收集在一个列中。 Let's say dataset looks like this: The region is 16 but I showed 6 here.
假设数据集如下所示:区域为 16,但我在这里显示了 6。
country regionarc regionarb regionarh regionary regionard regionarw
1 g dome NA
2 g ashi NA
3 g gongo NA
4 g NA salgi
5 g forh
I want it to look like this:我希望它看起来像这样:
country regionarc regionarb regionarh regionary regionard regionarw district
1 g dome NA dome
2 g ashi NA ashi
3 g gongo NA gongo
4 g NA salgi salgi
5 g NA forh forh
I think I may have to mutate and select the columns but I am not sure how to gather the districts in one columns.How can I gather the district inputs into one column?我想我可能不得不改变和 select 列,但我不知道如何将地区收集在一列中。如何将地区输入收集到一列中? Thank you in advance.
先感谢您。
dput output输入 output
structure(list(date = c("08-Jun-20", "08-Jun-20", "09-Jun-20",
"09-Jun-20"), Which.country.do.you.live.in. = c("G ", "G ", "G ",
"G "), Must.be.in.Ghana.form.not.visible = c(NA, NA, NA, NA),
Which.region.do.you.live.in. = c(NA, NA, NA, NA), Which.district.do.you.live.in...Ahafo.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Ashanti.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Bono.Region. = c("Dormaa East",
"", "", ""), Which.district.do.you.live.in...Bono.East.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Central.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Eastern.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Greater.Accra.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Northern.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Northern.East.Region. = c("",
"East Mamprusi", "", ""), Which.district.do.you.live.in...Oti.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Savannah.Region. = c("",
"", "", "Central Gonja"), Which.district.do.you.live.in...Upper.East.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Upper.West.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Volta.Region. = c("",
"", "Ho Municipal", ""), Which.district.do.you.live.in...Western.Region. = c(NA,
NA, NA, NA), Which.district.do.you.live.in...Western.North.Region. = c(NA,
NA, NA, NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA
), X.2 = c(NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))
If it is NA
elements, we can use coalesce
如果是
NA
元素,我们可以使用coalesce
library(dplyr)
df1 %>%
mutate(district = coalesce(!!! select(., starts_with('region'))))
# country regionarc regionarb regionarh regionary regionard regionarw district
#1 g dome <NA> <NA> NA <NA> <NA> dome
#2 g <NA> <NA> ashi NA <NA> <NA> ashi
#3 g <NA> gongo <NA> NA <NA> <NA> gongo
#4 g <NA> <NA> <NA> NA salgi <NA> salgi
#5 g <NA> <NA> <NA> NA <NA> forh forh
Or with reduce/coalesce
或
reduce/coalesce
library(purrr)
df1 %>%
mutate(district = select(., starts_with('region')) %>%
reduce(coalesce))
Or if the columns have blank ( "
), we can convert to NA
and then use coalesce
或者如果列有空白 (
"
),我们可以转换为NA
然后使用coalesce
df1 %>%
transmute_at(vars(starts_with('region')), na_if, '') %>%
transmute(district = coalesce(!!! .))
bind_cols(df1, .)
In the OP's dataset, we can select
the column names that starts_with
'Which.district.do.you.live.in'
, convert the blanks ( ""
) to NA
( na_if
) with mutate/across
(from the dplyr new version) or with mutate_all
and use that in coalesce
在 OP 的数据集中,我们可以
select
以 ' starts_with
'Which.district.do.you.live.in'
开头的列名,将空格 ( ""
) 转换为NA
( na_if
) 并使用mutate/across
(来自 dplyr 新版本)或与mutate_all
并在coalesce
中使用它
df2 <- df2 %>%
mutate(district =coalesce(!!! select(., starts_with('Which.district.do.you.live.in')) %>%
mutate(across(everything(), na_if, ""))) )
df2$district
#[1] "Dormaa East" "East Mamprusi" "Ho Municipal" "Central Gonja"
Or with mutate_all
或使用
mutate_all
df2 %>%
mutate(district =coalesce(!!! select(., starts_with('Which.district.do.you.live.in')) %>%
mutate_all(na_if, "")))
Or in base R
with pmin/pmax
或者在带有
pmin/pmax
的base R
df1$district <- do.call(pmax, c(df1[-1], na.rm = TRUE))
df1$district
#[1] "dome" "ashi" "gongo" "salgi" "forh"
Or we can use max.col
或者我们可以使用
max.col
df1$district <- df1[-1][cbind(seq_len(nrow(df1)), max.col(!is.na(df1[-1]), 'first'))]
df1$district
#[1] "dome" "ashi" "gongo" "salgi" "forh"
df1 <- structure(list(country = c("g", "g", "g", "g", "g"), regionarc = c("dome",
NA, NA, NA, NA), regionarb = c(NA, NA, "gongo", NA, NA), regionarh = c(NA,
"ashi", NA, NA, NA), regionary = c(NA, NA, NA, NA, NA), regionard = c(NA,
NA, NA, "salgi", NA), regionarw = c(NA, NA, NA, NA, "forh")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.