简体   繁体   English

使用R,如何基于与列A值的关系用前一行值填充列B中数据框的空白单元格

[英]Using R, how to fill empty cells of a dataframe in Column B with previous row value based on the relationship with Column A value

I have this type of dataframe: 我有这种类型的数据框:

df <- data.frame(ID = rep(letters[1:5], each = 2), 
DESC = as.character(as.factor(rep(c("Petit", " ", "Small", " ", "Medium", " ", "Large", " ", "X-Large", " "), times = 1))))

Basically, I need to paste the character string in the 'DESC' column with the corresponding 'ID' rows. 基本上,我需要将字符串与相应的“ ID”行粘贴到“ DESC”列中。 Ultimately, the result should look like this: 最终,结果应如下所示:

> df
      ID    DESC
1   a   Petit
2   a   Petit
3   b   Small
4   b   Small
5   c  Medium
6   c  Medium
7   d   Large
8   d   Large
9   e X-Large
10  e X-Large

Please note my actual dataframe is not this simple. 请注意,我的实际数据框并非如此简单。 For example, I have identical names in the 'ID' column which vary in the number of rows from 1 to 25 in which I need to paste the value in 'DESC' for that corresponding 'ID.' 例如,我在“ ID”列中具有相同的名称,其行数从1到25不等,我需要在其中将值粘贴到该相应“ ID”的“ DESC”中。 So, ID 'a' may have 24 rows in 'DESC' in which I need to fill 'Petit' and 'b' my have one row in which I need to fill 'Small.' 因此,ID“ a”在“ DESC”中可能有24行,我需要填写“ Petit”,而“ b”在我的一行中,我需要填写“ Small”。

I have tried writing scripts including sapply, grep, paste but failed. 我尝试编写包括sapply,grep,paste的脚本,但是失败了。 I tried writing a loop but it seems when I point to df$DESC it's stored as a factor although I forced it to a character vector...Any help, instruction or point to the functions that can handle this is greatly appreciated. 我尝试编写一个循环,但是当我指向df $ DESC时,它似乎被存储为一个因素,尽管我将其强制为字符向量...任何能帮助解决此问题的帮助,说明或指向的函数,都将不胜感激。 I know I can simply do it in excel, but this is besides the point!! 我知道我可以简单地在excel中做到这一点,但这不重要! I'm trying to learn how to do this in R, can cannot find any help online regarding this subject. 我正在尝试学习如何在R中执行此操作,无法在线找到有关此主题的任何帮助。

Thanks! 谢谢!

If the IDs are sorted with non-blank values in the first position, you can do a simple 'fill' with Reduce : 如果ID在第一个位置以非空白值排序,则可以使用Reduce进行简单的“填充”:

df$DESC = Reduce(function(x,y) if (y==' ') x else y, df$DESC, acc=T)

> df
#    ID    DESC
# 1   a   Petit
# 2   a   Petit
# 3   b   Small
# 4   b   Small
# 5   c  Medium
# 6   c  Medium
# 7   d   Large
# 8   d   Large
# 9   e X-Large
# 10  e X-Large

If you can use package zoo : 如果可以使用zoo软件包:

df[df$DESC==" ","DESC"] <- NA    # Correctly code missing values
df$DESC <- zoo::na.locf(df$DESC)

   ID    DESC
1   a   Petit
2   a   Petit
3   b   Small
4   b   Small
5   c  Medium
6   c  Medium
7   d   Large
8   d   Large
9   e X-Large
10  e X-Large

Here is an option with dplyr 这是dplyr的选项

library(dplyr)
df %>% 
  group_by(ID) %>%
  mutate(DESC = first(DESC))
#      ID    DESC
#   <fctr>  <fctr>
#1       a   Petit
#2       a   Petit
#3       b   Small
#4       b   Small
#5       c  Medium
#6       c  Medium
#7       d   Large
#8       d   Large
#9       e X-Large
#10      e X-Large

Or using data.table 或使用data.table

library(data.table)
setDT(df)[, DESC := DESC[1L], by = ID]

The forward fill solutions are nice, but if it is not sorted, we can remove all ' ' rows, and duplicates, then merge back the result: 前向填充解决方案很好,但是如果不进行排序,我们可以删除所有''行和重复项,然后合并回结果:

merge(subset(df, select = -DESC),unique(df[df$DESC != ' ',]), by = 'ID')

   ID    DESC
1   a   Petit
2   a   Petit
3   b   Small
4   b   Small
5   c  Medium
6   c  Medium
7   d   Large
8   d   Large
9   e X-Large
10  e X-Large

more readable, in multiple steps: 更具可读性,分多个步骤:

#find mapping
mapping = unique(df[df$DESC != ' ',])

#remove DESC from data
data = subset(df, select = -DESC)

#merge
merge(data, mapping, by = 'ID')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM