如何匹配 R 数据框中的行

Question

I have a data frame that looks something like this:我有一个看起来像这样的数据框：

participant参与者	Sex性别	Age年龄	interval间隔	reproduction再生产	condition健康）状况
22014 22014	Female女性	18 18	NA不适用	NA不适用	NA不适用
22014 22014	Female女性	18 18	1.536131 1.536131	NA不适用	NA不适用
22014 22014	Female女性	18 18	NA不适用	NA不适用	NA不适用
22014 22014	Female女性	18 18	1.416826 1.416826	NA不适用	NA不适用
22014 22014	Female女性	18 18	NA不适用	NA不适用	NA不适用
22014 22014	Female女性	18 18	1.549845 1.549845	NA不适用	NA不适用
22014 22014	Female女性	18 18	NA不适用	NA不适用	NA不适用
22014 22014	Female女性	18 18	1.542681 1.542681	NA不适用	NA不适用
22014 22014	Female女性	18 18	NA不适用	NA不适用	NA不适用
22014 22014	Female女性	18 18	1.265929 1.265929	NA不适用	NA不适用
22014 22014	Female女性	18 18	NA不适用	1.2531 1.2531	NA不适用
22014 22014	Female女性	18 18	NA不适用	1.2507 1.2507	NA不适用
22014 22014	Female女性	18 18	NA不适用	1.7841 1.7841	NA不适用
22014 22014	Female女性	18 18	NA不适用	1.3536 1.3536	NA不适用
22014 22014	Female女性	18 18	NA不适用	0.8031 0.8031	NA不适用
22014 22014	Female女性	18 18	NA不适用	NA不适用	Non-Causal非因果

...etc... ...等等...

I need to do 3 things:我需要做三件事：

i) 'backfill' the values in 'condition' upwards so that every cell in 'condition' upwards from a valid entry (here Non-Causal) is filled with that valid entry. i) 将“条件”中的值向上“回填”，以便从有效条目（此处为非因果）向上的“条件”中的每个单元格都填充有该有效条目。

ii) match the 5 entries in 'reproduction' with the 5 entries in 'interval' in corresponding order, ie so that 1.2531 is moved up to be next to 1.536131, and 1.2507 with 1.416826 etc ii) 将 'reproduction' 中的 5 个条目与 'interval' 中的 5 个条目按相应的顺序匹配，即 1.2531 向上移动到 1.536131 的旁边，1.2507 与 1.416826 等

iii) get rid of the NA rows so that in the end there are only 5 rows left, with valid entries in each of the columns iii) 去掉 NA 行，这样最后只剩下 5 行，每一列都有有效的条目

Any hints on how to tackle this?有关如何解决此问题的任何提示？ The actual dataframe is much longer, and 'condition' takes on different values;实际的数据帧要长得多，并且“条件”采用不同的值； there will always be 5 entries, though ,per condition, and they should have matched interval and reproduction entries但是，每个条件总会有 5 个条目，并且它们应该具有匹配的间隔和再现条目

Answer 1

You can group and summarize:您可以分组和总结：

library(dplyr)
dat %>%
  group_by(participant, Sex, Age) %>%
  summarize(across(c(interval, reproduction, condition), ~ .[!is.na(.)])) %>%
  ungroup()
# # A tibble: 5 x 6
#   participant Sex      Age interval reproduction condition 
#         <int> <chr>  <int>    <dbl>        <dbl> <chr>     
# 1       22014 Female    18     1.54        1.25  Non-Causal
# 2       22014 Female    18     1.42        1.25  Non-Causal
# 3       22014 Female    18     1.55        1.78  Non-Causal
# 4       22014 Female    18     1.54        1.35  Non-Causal
# 5       22014 Female    18     1.27        0.803 Non-Causal

(This will glitch if the number of non- NA in condition is other than 1 , or if the number of non- NA in the other columns is not the same.) （如果condition的非NA数量不是1 ，或者其他列中的非NA数量不相同，则会出现故障。）

Answer 2

You can so most of the work with dplyr and tidyr .您可以使用dplyr和tidyr大部分工作。 For example if your data is in a data.frame named dd ,例如，如果您的数据位于名为dd的 data.frame 中，

library(dplyr)
library(tidyr)
dd %>% 
  group_by(participant, Sex, Age) %>% 
  fill(condition, .direction="up") %>% 
  summarize(across(everything(), ~head(na.omit(.x), 5)))

We use tidyr::fill to back fill the condition, then use use dplyr::summarize() to keep only the first 5 non-NA for all the columns that are not use for grouping the rows.我们使用tidyr::fill条件，然后使用 use dplyr::summarize()为所有不用于行分组的列保留前 5 个非 NA。

Answer 3

Here is a base R solution, except for the functionna.locf , from package zoo .这是一个基本的 R 解决方案，除了来自包zoo的函数na.locf 。

df1$condition <- with(df1, ave(condition, participant, FUN = \(x) zoo::na.locf(x, fromLast =TRUE)))
i <- with(df1, ave(interval, participant, FUN = \(x) !is.na(x)))
j <- with(df1, ave(reproduction, participant, FUN = \(x) !is.na(x)))
df1$reproduction[as.logical(i)] <- df1$reproduction[as.logical(j)]
df1$reproduction[as.logical(j)] <- NA_real_
df1 <- df1[complete.cases(df1), ]

df1
#   participant    Sex Age interval reproduction  condition
#2        22014 Female  18 1.536131       1.2531 Non-Causal
#4        22014 Female  18 1.416826       1.2507 Non-Causal
#6        22014 Female  18 1.549845       1.7841 Non-Causal
#8        22014 Female  18 1.542681       1.3536 Non-Causal
#10       22014 Female  18 1.265929       0.8031 Non-Causal

Data数据

df1 <- read.table(text = "
participant     Sex     Age     interval    reproduction    condition
22014   Female  18  NA  NA  NA
22014   Female  18  1.536131    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.416826    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.549845    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.542681    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.265929    NA  NA
22014   Female  18  NA  1.2531  NA
22014   Female  18  NA  1.2507  NA
22014   Female  18  NA  1.7841  NA
22014   Female  18  NA  1.3536  NA
22014   Female  18  NA  0.8031  NA
22014   Female  18  NA  NA  Non-Causal
", header = TRUE)

Answer 4

This is the long way of what r2evans and Mr.Flick represent:这是 r2evans 和 Mr.Flick 代表的漫长道路：

library(dplyr)
library(tidyr)
df %>% 
  fill(condition, .direction = "up") %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(
    cols = c(interval, reproduction)
  ) %>% 
  na.omit() %>% 
  pivot_wider(
    names_from = name,
    values_from = value
  ) %>% 
  mutate(reproduction = lead(reproduction,5)) %>% 
  na.omit() %>% 
  select(-id) %>% 
  relocate(condition, .after = 6)

如何匹配 R 数据框中的行

问题描述

4 个解决方案

解决方案1
3 2021-11-02 18:14:15

解决方案2
2 2021-11-02 18:15:01

解决方案3
1 2021-11-02 18:26:15

Data数据

解决方案4
0 2021-11-02 18:39:14

如何匹配 R 数据框中的行

问题描述

4 个解决方案

解决方案1 3 2021-11-02 18:14:15

解决方案2 2 2021-11-02 18:15:01

解决方案3 1 2021-11-02 18:26:15

Data数据

解决方案4 0 2021-11-02 18:39:14

解决方案1
3 2021-11-02 18:14:15

解决方案2
2 2021-11-02 18:15:01

解决方案3
1 2021-11-02 18:26:15

解决方案4
0 2021-11-02 18:39:14