简体   繁体   English

如何匹配 R 数据框中的行

[英]How can I match-up rows in R data frame

I have a data frame that looks something like this:我有一个看起来像这样的数据框:

participant参与者 Sex性别 Age年龄 interval间隔 reproduction再生产 condition健康)状况
22014 22014 Female女性 18 18 NA不适用 NA不适用 NA不适用
22014 22014 Female女性 18 18 1.536131 1.536131 NA不适用 NA不适用
22014 22014 Female女性 18 18 NA不适用 NA不适用 NA不适用
22014 22014 Female女性 18 18 1.416826 1.416826 NA不适用 NA不适用
22014 22014 Female女性 18 18 NA不适用 NA不适用 NA不适用
22014 22014 Female女性 18 18 1.549845 1.549845 NA不适用 NA不适用
22014 22014 Female女性 18 18 NA不适用 NA不适用 NA不适用
22014 22014 Female女性 18 18 1.542681 1.542681 NA不适用 NA不适用
22014 22014 Female女性 18 18 NA不适用 NA不适用 NA不适用
22014 22014 Female女性 18 18 1.265929 1.265929 NA不适用 NA不适用
22014 22014 Female女性 18 18 NA不适用 1.2531 1.2531 NA不适用
22014 22014 Female女性 18 18 NA不适用 1.2507 1.2507 NA不适用
22014 22014 Female女性 18 18 NA不适用 1.7841 1.7841 NA不适用
22014 22014 Female女性 18 18 NA不适用 1.3536 1.3536 NA不适用
22014 22014 Female女性 18 18 NA不适用 0.8031 0.8031 NA不适用
22014 22014 Female女性 18 18 NA不适用 NA不适用 Non-Causal非因果

...etc... ...等等...

I need to do 3 things:我需要做三件事:

i) 'backfill' the values in 'condition' upwards so that every cell in 'condition' upwards from a valid entry (here Non-Causal) is filled with that valid entry. i) 将“条件”中的值向上“回填”,以便从有效条目(此处为非因果)向上的“条件”中的每个单元格都填充有该有效条目。

ii) match the 5 entries in 'reproduction' with the 5 entries in 'interval' in corresponding order, ie so that 1.2531 is moved up to be next to 1.536131, and 1.2507 with 1.416826 etc ii) 将 'reproduction' 中的 5 个条目与 'interval' 中的 5 个条目按相应的顺序匹配,即 1.2531 向上移动到 1.536131 的旁边,1.2507 与 1.416826 等

iii) get rid of the NA rows so that in the end there are only 5 rows left, with valid entries in each of the columns iii) 去掉 NA 行,这样最后只剩下 5 行,每一列都有有效的条目

Any hints on how to tackle this?有关如何解决此问题的任何提示? The actual dataframe is much longer, and 'condition' takes on different values;实际的数据帧要长得多,并且“条件”采用不同的值; there will always be 5 entries, though ,per condition, and they should have matched interval and reproduction entries但是,每个条件总会有 5 个条目,并且它们应该具有匹配的间隔和再现条目

You can group and summarize:您可以分组和总结:

library(dplyr)
dat %>%
  group_by(participant, Sex, Age) %>%
  summarize(across(c(interval, reproduction, condition), ~ .[!is.na(.)])) %>%
  ungroup()
# # A tibble: 5 x 6
#   participant Sex      Age interval reproduction condition 
#         <int> <chr>  <int>    <dbl>        <dbl> <chr>     
# 1       22014 Female    18     1.54        1.25  Non-Causal
# 2       22014 Female    18     1.42        1.25  Non-Causal
# 3       22014 Female    18     1.55        1.78  Non-Causal
# 4       22014 Female    18     1.54        1.35  Non-Causal
# 5       22014 Female    18     1.27        0.803 Non-Causal

(This will glitch if the number of non- NA in condition is other than 1 , or if the number of non- NA in the other columns is not the same.) (如果condition的非NA数量不是1 ,或者其他列中的非NA数量不相同,则会出现故障。)

You can so most of the work with dplyr and tidyr .您可以使用dplyrtidyr大部分工作。 For example if your data is in a data.frame named dd ,例如,如果您的数据位于名为dd的 data.frame 中,

library(dplyr)
library(tidyr)
dd %>% 
  group_by(participant, Sex, Age) %>% 
  fill(condition, .direction="up") %>% 
  summarize(across(everything(), ~head(na.omit(.x), 5)))

We use tidyr::fill to back fill the condition, then use use dplyr::summarize() to keep only the first 5 non-NA for all the columns that are not use for grouping the rows.我们使用tidyr::fill条件,然后使用 use dplyr::summarize()为所有不用于行分组的列保留前 5 个非 NA。

Here is a base R solution, except for the functionna.locf , from package zoo .这是一个基本的 R 解决方案,除了来自包zoo的函数na.locf

df1$condition <- with(df1, ave(condition, participant, FUN = \(x) zoo::na.locf(x, fromLast =TRUE)))
i <- with(df1, ave(interval, participant, FUN = \(x) !is.na(x)))
j <- with(df1, ave(reproduction, participant, FUN = \(x) !is.na(x)))
df1$reproduction[as.logical(i)] <- df1$reproduction[as.logical(j)]
df1$reproduction[as.logical(j)] <- NA_real_
df1 <- df1[complete.cases(df1), ]

df1
#   participant    Sex Age interval reproduction  condition
#2        22014 Female  18 1.536131       1.2531 Non-Causal
#4        22014 Female  18 1.416826       1.2507 Non-Causal
#6        22014 Female  18 1.549845       1.7841 Non-Causal
#8        22014 Female  18 1.542681       1.3536 Non-Causal
#10       22014 Female  18 1.265929       0.8031 Non-Causal

Data数据

df1 <- read.table(text = "
participant     Sex     Age     interval    reproduction    condition
22014   Female  18  NA  NA  NA
22014   Female  18  1.536131    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.416826    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.549845    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.542681    NA  NA
22014   Female  18  NA  NA  NA
22014   Female  18  1.265929    NA  NA
22014   Female  18  NA  1.2531  NA
22014   Female  18  NA  1.2507  NA
22014   Female  18  NA  1.7841  NA
22014   Female  18  NA  1.3536  NA
22014   Female  18  NA  0.8031  NA
22014   Female  18  NA  NA  Non-Causal
", header = TRUE)

This is the long way of what r2evans and Mr.Flick represent:这是 r2evans 和 Mr.Flick 代表的漫长道路:

library(dplyr)
library(tidyr)
df %>% 
  fill(condition, .direction = "up") %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(
    cols = c(interval, reproduction)
  ) %>% 
  na.omit() %>% 
  pivot_wider(
    names_from = name,
    values_from = value
  ) %>% 
  mutate(reproduction = lead(reproduction,5)) %>% 
  na.omit() %>% 
  select(-id) %>% 
  relocate(condition, .after = 6)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果一行中的任何值与向量中的值匹配,我如何在 R 中对数据框中的行进行子集化? - How can I subset rows in a data frame in R if any value in one row match values in a vector? R如何计算数据框中行之间的差异 - R how can I calculate difference between rows in a data frame 如何在 R 的数据框中搜索和压缩重复行? - How can I search and condense repetitive rows in a data frame in R? R:如何根据数据框的值添加行? - R: How can I add rows based on values of a data frame? 如何用与第二个数据帧中的行匹配的两个变量标记行? [R - How do I tag rows with two variables that match rows in a second data frame? R 如何使用 R Studio 中旧数据框的精确行创建新数据框? - how can I create a new data frame using exact rows from the old data frame in R Studio? 在 R 中,如何将一个数据框中选定行的值与另一个数据框中选定的列匹配? - In R, how do I match values from selected rows in one data frame with selected columns in another? 如何根据R中的字符串匹配来聚合数据帧中的行? - 正则表达式 - How to aggregate rows in a data frame based on string match in R? - regex R根据公式匹配数据框中的行 - R Match rows in a data frame based on formula 按列匹配将R数据中的行展平 - Flatten rows in R data frame by column match
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM