简体   繁体   English

如何创建一个函数来创建一个包含组合观察的列

[英]How to make a function that creates a column with combined observations

I am obviously new to data cleaning and I am having trouble cleaning a survey export.我显然是数据清理的新手,并且在清理调查导出时遇到了麻烦。 This is how my data frame looks in raw form.这就是我的数据框以原始形式显示的样子。

Var1          Colname1  Colname2  Colname3  Var2
Observation1  NA        NA        Val1      Val_1
Observation2  NA        Val2      NA        Val_1
Observation3  Val3      NA        NA        Val_1
Observation4  Val4      Val5      NA        Val_2
Observation5  NA        NA        Val6      Val_2

I would like to have my data cleaned to look like this:我想将我的数据清理成这样:

Var1         SubVar1 Var2
Observation1 Val1    Val_1
Observation2 Val2    Val_1
Observation3 Val3    Val_1
Observation4 Val4    Val_2
Observation4 Val5    Val_2
Observation5 Val6    Val_2

I have tried to remove NA values:我试图删除 NA 值:

df1 <- na.omit(c(Colname1, Colname2, Colname3))

The problem is that it will delete all rows because there is an NA in every row.问题是它会删除所有行,因为每一行都有一个 NA。 I have also tried to concatenate the values and then use the separate_rows() function, but that will only work with observations that only have one value in one column.我还尝试连接值,然后使用 separate_rows() 函数,但这仅适用于在一列中只有一个值的观察。 For observations that contain values in multiple columns (see Observation4), this will not work.对于在多列中包含值的观察(请参阅 Observation4),这将不起作用。

Thanks for any help you guys can provide!感谢你们提供的任何帮助!

Try,尝试,

data %>% mutate(SubVar1 = coalesce(Colname1,Colname2,Colname3)) %>%
         select(Var1, SubVar1, Var2)

I would think of this as a pivot (reshaping) operation from wide to long:我会认为这是从宽到长的枢轴(重塑)操作:

library(dplyr)
library(tidyr)

data %>%
  pivot_longer(cols = Colname1:Colname3, values_to = "SubVar1") %>%
  filter(!is.na(SubVar1)) %>%
  select(Var1, SubVar1, Var2)
# # A tibble: 6 × 3
#   Var1         SubVar1 Var2 
#   <chr>        <chr>   <chr>
# 1 Observation1 Val1    Val_1
# 2 Observation2 Val2    Val_1
# 3 Observation3 Val3    Val_1
# 4 Observation4 Val4    Val_2
# 5 Observation4 Val5    Val_2
# 6 Observation5 Val6    Val_2

To understand what's happening, run the first line, then the first and second line, then the first, second and third line, etc. See ?pivot_longer for several other options in specifying which columns to pivot - you could name the explicitly, use a name pattern like names_pattern = "Colname" or use the Colname1:Colname3 to select consecutive columns as I did above.要了解发生了什么,请运行第一行,然后是第一行和第二行,然后是第一行、第二行和第三行,等等。有关指定要透视的列的其他几个选项,请参阅?pivot_longer - 您可以明确命名,使用 a名称模式,如names_pattern = "Colname"或使用Colname1:Colname3来选择连续的列,就像我上面所做的那样。

We can use base R in a vectorized way with row/column indexing.我们可以通过行/列索引以矢量化方式使用base R Subset the columns where the column names are 'Colname', then get the column index of non-NA element for each row with max.col , cbind the row sequence, extract the corresponding element and create the new data.frame将列名为'Colname'的列子集,然后用max.col获取每行非NA元素的列索引, cbind行序列,提取对应的元素,创建新的data.frame

i1 <- startsWith(names(df1), "Colname")
data.frame(df1['Var1'], SubVar1 = df1[i1][cbind(seq_len(nrow(df1)), 
      max.col(!is.na(df1[i1]), "first"))], df1['Var2'])
          Var1 SubVar1  Var2
1 Observation1    Val1 Val_1
2 Observation2    Val2 Val_1
3 Observation3    Val3 Val_1
4 Observation4    Val4 Val_2
5 Observation5    Val6 Val_2

data数据

df1 <- structure(list(Var1 = c("Observation1", "Observation2", "Observation3", 
"Observation4", "Observation5"), Colname1 = c(NA, NA, "Val3", 
"Val4", NA), Colname2 = c(NA, "Val2", NA, "Val5", NA), Colname3 = c("Val1", 
NA, NA, NA, "Val6"), Var2 = c("Val_1", "Val_1", "Val_1", "Val_2", 
"Val_2")), class = "data.frame", row.names = c(NA, -5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 进行与相邻列中的观测值相同的NA观测值 - Make NA observations that are identical with observations in adjacent column 如何从数据框中提取观察值并创建一个显示观察值,列名和行名的表? - How to extract observations from a data frame and make a table showing observations, column name, and row name? 在R中定义函数时如何计算列中的观察数? - How to count the number of observations in a column while defining a function in R? 如何将观察值转换为列并表示这些观察值的出现次数 - How to transform observations to column and reprensent the number of occurence of these observations 如何将组均值与单个观察值进行比较并创建新的 TRUE/FALSE 列? - How do I compare group means to individual observations and make a new TRUE/FALSE column? 如何按数据框中的列名重命名观察结果? - How to rename observations by column name in data frame? 如何将均值推算到列中的特定观察值中? - How to impute means into specific observations in a column? 如何对某些观察结果运行函数? - How to run a function over certain observations? 如何在观察上运行 function 获得多个结果? - how to get multiple outcomes for running a function on observations? R 如何对依赖于其他观察的 function 进行矢量化 - R How vectorize a function that depends on other observations
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM