简体   繁体   English

为什么在使用 pivot_wider 时会产生 NA 值?

[英]Why are there NA values produced when using pivot_wider?

I'm trying to use pivot wider to create multiple columns/variables containing values, but I NAs in columns I shouldn't.我正在尝试使用pivot wider来创建包含值的多个列/变量,但我不应该在列中使用 NA。

Here is a representative sample of the data:以下是数据的代表性样本:

df <- structure(list(Condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Control", "Retraction1", 
"Retraction2"), class = "factor"), First = structure(c(2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Second = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Third = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Fourth = structure(c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), ID = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100", 
"101"), class = "factor"), Scenario = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 4L), .Label = c("J", "P", "R", 
"S"), class = "factor"), Estimate = structure(c(4L, 8L, 7L, 11L, 
9L, 12L, 10L, 2L, 5L, 6L, 4L, 7L, 11L, 9L, 12L, 10L, 2L, 3L, 
5L, 6L, 4L, 8L, 7L, 11L, 9L, 12L, 10L, 2L, 5L, 6L, 4L, 8L, 7L, 
11L, 9L, 12L, 10L, 2L, 5L, 6L, 1L, 1L, 1L, 1L), .Label = c("CompMean", 
"P.H.Reps.", "P.H.Reps..1", "P.Rel.", "P.Rel1.Reps.", "P.Rel2.Reps.", 
"P.Rep1.nH.nRel.", "P.Rep1.nH.Rel.", "P.Rep2.nH.nRel.nRep1.", 
"P.Rep2.nH.nRel.Rep1.", "P.Rep2.nH.Rel.nRep1.", "P.Rep2.nH.Rel.Rep1."
), class = "factor"), value = c(90L, 8L, 82L, 11L, 82L, 11L, 
82L, 100L, 99L, NA, 62L, 11L, 91L, 12L, 91L, 5L, 82L, 91L, 80L, 
NA, 92L, 12L, 61L, 18L, 90L, 21L, 81L, 96L, 92L, NA, 91L, 10L, 
72L, 22L, 62L, 21L, 73L, 99L, 98L, NA, 7L, 7L, 7L, 7L)), row.names = c(NA, 
-44L), class = c("tbl_df", "tbl", "data.frame"))

head(df)

This is data from one subject.这是来自一个主题的数据。 There should only be NAs in the P.Rel2.Reps. P.Rel2.Reps.应该只有NA P.Rel2.Reps. and no other.没有其他。

However, there are NAs in some of the other columns when I use pivot wider like so:但是,当我像这样使用更宽的枢轴时,其他一些列中有 NAs:

pivot_wider(df, names_from = Estimate, values_from = value)

Here is an example of how the data look after pivoting wider.这是一个示例,说明数据在旋转更宽后的样子。

df2 <- structure(list(Condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("Control", "Retraction1", "Retraction2"
), class = "factor"), First = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Journalist", "Police", "Reviewer", 
"Spokesperson"), class = "factor"), Second = structure(c(3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Third = structure(c(1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Fourth = structure(c(4L, 
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), ID = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", 
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", 
"82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92", 
"93", "94", "95", "96", "97", "98", "99", "100", "101"), class = "factor"), 
    Scenario = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
    2L), .Label = c("J", "P", "R", "S"), class = "factor"), P.Rel. = c(90L, 
    62L, 92L, 91L, 57L, 81L, 71L, 80L, 40L, 75L), P.Rep1.nH.Rel. = c(8L, 
    NA, 12L, 10L, 31L, NA, 19L, 17L, 25L, NA), P.Rep1.nH.nRel. = c(82L, 
    11L, 61L, 72L, 89L, 15L, 79L, 84L, 76L, 25L), P.Rep2.nH.Rel.nRep1. = c(11L, 
    91L, 18L, 22L, 35L, 64L, 30L, 22L, 25L, 50L), P.Rep2.nH.nRel.nRep1. = c(82L, 
    12L, 90L, 62L, 62L, 13L, 45L, 53L, 25L, 50L), P.Rep2.nH.Rel.Rep1. = c(11L, 
    91L, 21L, 21L, 15L, 52L, 9L, 10L, 100L, 50L), P.Rep2.nH.nRel.Rep1. = c(82L, 
    5L, 81L, 73L, 67L, 22L, 60L, 61L, 100L, 25L), P.H.Reps. = c(100L, 
    82L, 96L, 99L, 81L, 40L, 71L, 76L, 75L, 90L), P.Rel1.Reps. = c(99L, 
    80L, 92L, 98L, 81L, 80L, 89L, 79L, 75L, 76L), P.Rel2.Reps. = c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), P.H.Reps..1 = c(NA, 
    91L, NA, NA, NA, 80L, NA, NA, NA, 100L), CompMean = c(7L, 
    7L, 7L, 7L, 7L, 7L, 7L, 6L, 4L, 7L)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

head(df2)

I have seen there is a similar post on this topic but it doesn't answer why NAs are being produced in my situation.我看到有一个关于这个主题的类似帖子,但它没有回答为什么在我的情况下会产生 NA。

Do I need to add some other argument?我需要添加一些其他参数吗?

Looking at the data it looks like you have some corrupted data at one place.查看数据,您似乎在某个地方有一些损坏的数据。 You can correct it by你可以通过

df$Estimate <- replace(df$Estimate, df$Estimate == "P.H.Reps..1", "P.Rep1.nH.Rel.") 

and then use pivot_wider which will give you NA only in column ie P.Rel2.Reps.然后使用pivot_wider ,它只会在列中为您提供NA ,即P.Rel2.Reps.

tidyr::pivot_wider(df, names_from = Estimate, values_from = value) 

NA values will result for any combination of categories for the new pivoted columns that aren't present in the original long data frame.对于原始长数据框中不存在的新旋转列的任何类别组合,将产生 NA 值。 For example, let's look at the rows of the long data frame with Estimate=="P.Rep1.nH.Rel."例如,让我们看一下Estimate=="P.Rep1.nH.Rel."的长数据帧的行Estimate=="P.Rep1.nH.Rel." :

df %>% filter(Estimate=="P.Rep1.nH.Rel.")
 Condition First Second Third Fourth ID Scenario Estimate value 1 Control Police Reviewer Journalist Spokesperson 1 J P.Rep1.nH.Rel. 8 2 Control Police Reviewer Journalist Spokesperson 1 R P.Rep1.nH.Rel. 12 3 Control Police Reviewer Journalist Spokesperson 1 S P.Rep1.nH.Rel. 10

Now look at the results of pivot_wider (I've kept only the relevant columns for brevity).现在看看pivot_wider的结果(为了简洁,我只保留了相关的列)。 Note in the output below that there's a missing value in the P.Rep1.nH.Rel.请注意,在下面的输出中, P.Rep1.nH.Rel.中有一个缺失值P.Rep1.nH.Rel. column.柱子。 The missing value occurs when Scenario=="P" because the long data frame doesn't have a row for P.Rep1.nH.Rel.Scenario=="P"时会出现缺失值,因为长数据框没有P.Rep1.nH.Rel.的行P.Rep1.nH.Rel. with Scenario=="P" resulting in a missing value in the wide data frame. Scenario=="P"导致宽数据框中的缺失值。 Missing values are occurring in the PHReps..1 column for a similar reason, as there's only one row with Estimate=="PHReps..1 in the long data frame and it has Scenario=="P" . Thus, the values are missing for the other three scenarios.由于类似的原因,在PHReps..1列中出现了缺失值,因为在长数据框中只有一行Estimate=="PHReps..1并且它有Scenario=="P" 。因此,这些值是缺少其他三个场景。

pivot_wider(df, names_from = Estimate, values_from = value) %>% 
   select(Condition:Scenario, P.Rep1.nH.Rel., P.H.Reps..1)
 Condition First Second Third Fourth ID Scenario P.Rep1.nH.Rel. PHReps..1 1 Control Police Reviewer Journalist Spokesperson 1 J 8 NA 2 Control Police Reviewer Journalist Spokesperson 1 P NA 91 3 Control Police Reviewer Journalist Spokesperson 1 R 12 NA 4 Control Police Reviewer Journalist Spokesperson 1 S 10 NA

This may be a data error, as suggested by @RonakShah, but if the data are correct then the NA values will naturally result when pivoting to wide format.正如@RonakShah 所建议的,这可能是数据错误,但如果数据正确,那么在转换为宽格式时自然会产生 NA 值。 You can fill the missing values with some other value by adding the argument values_fill=list(value=0) to pivot_wider (you can of course use any fill value you wish; I've just used 0 for illustration).您可以通过将参数values_fill=list(value=0)pivot_wider (您当然可以使用任何您希望的填充值;我刚刚使用0进行说明)来用其他一些值填充缺失值。 Note that even if you use the values_fill argument, explicit missing values in the original long data will still be preserved in the wide data frame.请注意,即使您使用values_fill参数,原始长数据中的显式缺失值仍将保留在宽数据框中。 Only missing values that result from the pivoting operation will be filled with a different value.只有由旋转操作产生的缺失值才会用不同的值填充。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM