[英]Replace values from one list of columns from another list of columns based on conditions in R or python
(for pythonistas, the below code is in R's format before I get some #hatehard) (对于pythonistas,下面的代码为R格式,直到获得#hatehard为止)
This one has been frustrating me for a way too long. 这个让我感到沮丧的时间已经太久了。
I have 2 datasets 我有2个数据集
df1 <- data.frame(ID = c("Person.A", "Person.B", "Person.C", "Person.D", "Person.E", "Person.F"),
Aa = c(0,1,2,NA,1,1),
Ab = c(0,NA,2,1,1,1),
Ac = c(NA,NA,2,2,1,1),
no.match = c(0,1,2,2,1,2))
df2 <- data.frame(ID = c("Person.A", "Person.B", "Person.C", "Person.D", "Person.E"),
Ba = c(0,NA,2,1,1),
Bb = c(NA,1,2,2,1),
Bc = c(0,1,2,2,1))
I then merge these 2 datasets using merge(df1, df2, all.x = T, by = "ID"
to get: 然后,我使用merge(df1, df2, all.x = T, by = "ID"
合并这两个数据集,得到:
ID Aa Ab Ac no.match Ba Bb Bc
1 Person.A 0 0 NA 0 0 NA 0
2 Person.B 1 NA NA 1 NA 1 1
3 Person.C 2 2 2 2 2 2 2
4 Person.D NA 1 2 2 1 2 2
5 Person.E 1 1 1 1 1 1 1
6 Person.F 1 1 1 2 NA NA NA
The actual datasets are much more complicated with lots of columns that have no matches in other columns. 实际的数据集要复杂得多,因为许多列在其他列中都没有匹配项。 So I don't think I could do something that depends on the arrangement of the columns. 因此,我认为我不能根据列的排列来做些什么。
Columns Aa
and Ba
contain the same information; Aa
和Ba
列包含相同的信息; and columns Ab
and Bb
do as well, and so on, but column no.match
does not contain a matching column. Ab
和Bb
列也是如此,依此类推,但是no.match
列不包含匹配的列。
I want to "map" the values from from the same row of Ba
to Aa
if Aa
is NA and do the same for Ab
and Bb
, Ac
and Bc
, etc. 如果 Aa
是NA,我想将Ba
的同一行中的值“映射”到Aa
,并对Ab
和Bb
, Ac
和Bc
等执行相同的操作。
The result DF in this case would look like: 在这种情况下,结果DF看起来像:
ID Aa Ab Ac no.match Ba Bb Bc
1 Person.A 0 0 0 0 0 NA 0
2 Person.B 1 1 1 1 NA 1 1
3 Person.C 2 2 2 2 2 2 2
4 Person.D 1 1 2 2 1 2 NA
5 Person.E 1 1 1 1 1 1 1
6 Person.F 1 1 1 2 NA NA NA
Where element [4,2]
was replaced by element [4,6]
The rows and the columns need to match up. 其中元素[4,2]
被元素[4,6]
取代。行和列需要匹配。
I've tried an embarrassingly large number of things: apply
, ifelse
, iterating through a list of columns l1 = c('Aa','Ab','Ac'), l2 = c('Ba', 'Bb', 'Bc')
我尝试了很多令人尴尬的事情: apply
, ifelse
,遍历列l1 = c('Aa','Ab','Ac'), l2 = c('Ba', 'Bb', 'Bc')
I can do the one-off: which(is.na(mdf$Aa)) <- mdf[which(is.na(mdf$Aa)), c("Ba")]
我可以一次性完成: which(is.na(mdf$Aa)) <- mdf[which(is.na(mdf$Aa)), c("Ba")]
But how can I do this iteratively? 但是我该怎么做呢?
Thank you! 谢谢! (sorry for the long-windedness) (很抱歉)
Here's one using data.table v1.9.5
- installation instructions here : 这是一个使用data.table v1.9.5
- 此处的安装说明:
require(data.table) # v1.9.5+
cols1 = names(df1)[2:4]
cols2 = names(df2)[2:4]
foo <- function(x, y) {
nas = is.na(x)
x[nas] = y[nas]
x
}
setDT(df1)[df2, c(cols1, cols2) := c(Map(foo, mget(cols1),
mget(cols2)), mget(cols2)), on = "ID"]
> df1
# ID Aa Ab Ac no.match Ba Bb Bc
# 1: Person.A 0 0 0 0 0 NA 0
# 2: Person.B 1 1 1 1 NA 1 1
# 3: Person.C 2 2 2 2 2 2 2
# 4: Person.D 1 1 2 2 1 2 2
# 5: Person.E 1 1 1 1 1 1 1
# 6: Person.F 1 1 1 2 NA NA NA
setDT()
converts df1
to a data.table by reference. setDT()
通过引用将df1
转换为data.table 。
setDT(df1)[df2, on = "ID"]
performs a join. setDT(df1)[df2, on = "ID"]
执行setDT(df1)[df2, on = "ID"]
。 For each row of df2
, we find the matching rows in df1
and extract the columns corresponding to matching rows.. 对于df2
每一行,我们在df1
找到匹配的行,并提取与匹配的行相对应的列。
On the matching rows, we update columns in cols1
and add new columns in cols2
by reference using the :=
operator. 在匹配的行,我们更新列cols1
和添加新列cols2
使用引用 :=
操作符。 For updating columns, we extract the columns specified in cols1
and cols2
and replace NA
s with the function foo()
. 为了更新列,我们提取在cols1
和cols2
指定的列,并将NA
替换为函数foo()
。 For adding columns, we simply pull the columns cols2
, using mget()
. 为了添加列,我们只需使用mget()
拉列cols2
。 We concatenate the two lists using c()
. 我们使用c()
连接两个列表。
If you're interested, have a look at the HTML vignettes to learn more. 如果您有兴趣,请查看HTML小插图以了解更多信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.