[英]How can I match-up rows in R data frame
I have a data frame that looks something like this:我有一个看起来像这样的数据框:
participant参与者 | Sex性别 | Age年龄 | interval间隔 | reproduction再生产 | condition健康)状况 |
---|---|---|---|---|---|
22014 22014 | Female女性 | 18 18 | NA不适用 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | 1.536131 1.536131 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | 1.416826 1.416826 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | 1.549845 1.549845 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | 1.542681 1.542681 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | 1.265929 1.265929 | NA不适用 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | 1.2531 1.2531 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | 1.2507 1.2507 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | 1.7841 1.7841 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | 1.3536 1.3536 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | 0.8031 0.8031 | NA不适用 |
22014 22014 | Female女性 | 18 18 | NA不适用 | NA不适用 | Non-Causal非因果 |
...etc... ...等等...
I need to do 3 things:我需要做三件事:
i) 'backfill' the values in 'condition' upwards so that every cell in 'condition' upwards from a valid entry (here Non-Causal) is filled with that valid entry. i) 将“条件”中的值向上“回填”,以便从有效条目(此处为非因果)向上的“条件”中的每个单元格都填充有该有效条目。
ii) match the 5 entries in 'reproduction' with the 5 entries in 'interval' in corresponding order, ie so that 1.2531 is moved up to be next to 1.536131, and 1.2507 with 1.416826 etc ii) 将 'reproduction' 中的 5 个条目与 'interval' 中的 5 个条目按相应的顺序匹配,即 1.2531 向上移动到 1.536131 的旁边,1.2507 与 1.416826 等
iii) get rid of the NA rows so that in the end there are only 5 rows left, with valid entries in each of the columns iii) 去掉 NA 行,这样最后只剩下 5 行,每一列都有有效的条目
Any hints on how to tackle this?有关如何解决此问题的任何提示? The actual dataframe is much longer, and 'condition' takes on different values;实际的数据帧要长得多,并且“条件”采用不同的值; there will always be 5 entries, though ,per condition, and they should have matched interval and reproduction entries但是,每个条件总会有 5 个条目,并且它们应该具有匹配的间隔和再现条目
You can group and summarize:您可以分组和总结:
library(dplyr)
dat %>%
group_by(participant, Sex, Age) %>%
summarize(across(c(interval, reproduction, condition), ~ .[!is.na(.)])) %>%
ungroup()
# # A tibble: 5 x 6
# participant Sex Age interval reproduction condition
# <int> <chr> <int> <dbl> <dbl> <chr>
# 1 22014 Female 18 1.54 1.25 Non-Causal
# 2 22014 Female 18 1.42 1.25 Non-Causal
# 3 22014 Female 18 1.55 1.78 Non-Causal
# 4 22014 Female 18 1.54 1.35 Non-Causal
# 5 22014 Female 18 1.27 0.803 Non-Causal
(This will glitch if the number of non- NA
in condition
is other than 1
, or if the number of non- NA
in the other columns is not the same.) (如果condition
的非NA
数量不是1
,或者其他列中的非NA
数量不相同,则会出现故障。)
You can so most of the work with dplyr
and tidyr
.您可以使用dplyr
和tidyr
大部分工作。 For example if your data is in a data.frame named dd
,例如,如果您的数据位于名为dd
的 data.frame 中,
library(dplyr)
library(tidyr)
dd %>%
group_by(participant, Sex, Age) %>%
fill(condition, .direction="up") %>%
summarize(across(everything(), ~head(na.omit(.x), 5)))
We use tidyr::fill
to back fill the condition, then use use dplyr::summarize()
to keep only the first 5 non-NA for all the columns that are not use for grouping the rows.我们使用tidyr::fill
条件,然后使用 use dplyr::summarize()
为所有不用于行分组的列保留前 5 个非 NA。
Here is a base R solution, except for the functionna.locf
, from package zoo
.这是一个基本的 R 解决方案,除了来自包zoo
的函数na.locf
。
df1$condition <- with(df1, ave(condition, participant, FUN = \(x) zoo::na.locf(x, fromLast =TRUE)))
i <- with(df1, ave(interval, participant, FUN = \(x) !is.na(x)))
j <- with(df1, ave(reproduction, participant, FUN = \(x) !is.na(x)))
df1$reproduction[as.logical(i)] <- df1$reproduction[as.logical(j)]
df1$reproduction[as.logical(j)] <- NA_real_
df1 <- df1[complete.cases(df1), ]
df1
# participant Sex Age interval reproduction condition
#2 22014 Female 18 1.536131 1.2531 Non-Causal
#4 22014 Female 18 1.416826 1.2507 Non-Causal
#6 22014 Female 18 1.549845 1.7841 Non-Causal
#8 22014 Female 18 1.542681 1.3536 Non-Causal
#10 22014 Female 18 1.265929 0.8031 Non-Causal
df1 <- read.table(text = "
participant Sex Age interval reproduction condition
22014 Female 18 NA NA NA
22014 Female 18 1.536131 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.416826 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.549845 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.542681 NA NA
22014 Female 18 NA NA NA
22014 Female 18 1.265929 NA NA
22014 Female 18 NA 1.2531 NA
22014 Female 18 NA 1.2507 NA
22014 Female 18 NA 1.7841 NA
22014 Female 18 NA 1.3536 NA
22014 Female 18 NA 0.8031 NA
22014 Female 18 NA NA Non-Causal
", header = TRUE)
This is the long way of what r2evans and Mr.Flick represent:这是 r2evans 和 Mr.Flick 代表的漫长道路:
library(dplyr)
library(tidyr)
df %>%
fill(condition, .direction = "up") %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = c(interval, reproduction)
) %>%
na.omit() %>%
pivot_wider(
names_from = name,
values_from = value
) %>%
mutate(reproduction = lead(reproduction,5)) %>%
na.omit() %>%
select(-id) %>%
relocate(condition, .after = 6)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.