简体   繁体   English

根据R或python中的条件替换另一列列表中的一个列列表中的值

[英]Replace values from one list of columns from another list of columns based on conditions in R or python

(for pythonistas, the below code is in R's format before I get some #hatehard) (对于pythonistas,下面的代码为R格式,直到获得#hatehard为止)

This one has been frustrating me for a way too long. 这个让我感到沮丧的时间已经太久了。

I have 2 datasets 我有2个数据集

df1 <- data.frame(ID = c("Person.A", "Person.B", "Person.C", "Person.D", "Person.E", "Person.F"),
                  Aa = c(0,1,2,NA,1,1),
                  Ab = c(0,NA,2,1,1,1),
                  Ac = c(NA,NA,2,2,1,1),
                  no.match = c(0,1,2,2,1,2))

df2 <- data.frame(ID = c("Person.A", "Person.B", "Person.C", "Person.D", "Person.E"),
                  Ba = c(0,NA,2,1,1),
                  Bb = c(NA,1,2,2,1),
                  Bc = c(0,1,2,2,1))

I then merge these 2 datasets using merge(df1, df2, all.x = T, by = "ID" to get: 然后,我使用merge(df1, df2, all.x = T, by = "ID"合并这两个数据集,得到:

         ID Aa Ab Ac no.match Ba Bb Bc
1 Person.A  0  0 NA        0  0 NA  0
2 Person.B  1 NA NA        1 NA  1  1
3 Person.C  2  2  2        2  2  2  2
4 Person.D NA  1  2        2  1  2  2
5 Person.E  1  1  1        1  1  1  1
6 Person.F  1  1  1        2 NA NA NA

The actual datasets are much more complicated with lots of columns that have no matches in other columns. 实际的数据集要复杂得多,因为许多列在其他列中都没有匹配项。 So I don't think I could do something that depends on the arrangement of the columns. 因此,我认为我不能根据列的排列来做些什么。

Columns Aa and Ba contain the same information; AaBa列包含相同的信息; and columns Ab and Bb do as well, and so on, but column no.match does not contain a matching column. AbBb列也是如此,依此类推,但是no.match列不包含匹配的列。

I want to "map" the values from from the same row of Ba to Aa if Aa is NA and do the same for Ab and Bb , Ac and Bc , etc. 如果 Aa是NA,我想将Ba的同一行中的值“映射”到Aa ,并对AbBbAcBc等执行相同的操作。

The result DF in this case would look like: 在这种情况下,结果DF看起来像:

        ID Aa Ab Ac no.match Ba Bb Bc
1 Person.A  0  0  0      0    0 NA  0
2 Person.B  1  1  1      1    NA  1  1
3 Person.C  2  2  2      2    2  2  2
4 Person.D  1  1  2      2    1  2 NA
5 Person.E  1  1  1      1    1  1  1
6 Person.F  1  1  1      2    NA NA NA

Where element [4,2] was replaced by element [4,6] The rows and the columns need to match up. 其中元素[4,2]被元素[4,6]取代。行和列需要匹配。

I've tried an embarrassingly large number of things: apply , ifelse , iterating through a list of columns l1 = c('Aa','Ab','Ac'), l2 = c('Ba', 'Bb', 'Bc') 我尝试了很多令人尴尬的事情: applyifelse ,遍历列l1 = c('Aa','Ab','Ac'), l2 = c('Ba', 'Bb', 'Bc')

I can do the one-off: which(is.na(mdf$Aa)) <- mdf[which(is.na(mdf$Aa)), c("Ba")] 我可以一次性完成: which(is.na(mdf$Aa)) <- mdf[which(is.na(mdf$Aa)), c("Ba")]

But how can I do this iteratively? 但是我该怎么做呢?

Thank you! 谢谢! (sorry for the long-windedness) (很抱歉)

Here's one using data.table v1.9.5 - installation instructions here : 这是一个使用data.table v1.9.5 - 此处的安装说明:

require(data.table) # v1.9.5+
cols1 = names(df1)[2:4]
cols2 = names(df2)[2:4]

foo <- function(x, y) {
    nas = is.na(x)
    x[nas] = y[nas]
    x
}
setDT(df1)[df2, c(cols1, cols2) := c(Map(foo, mget(cols1), 
                   mget(cols2)), mget(cols2)), on = "ID"]

> df1
#          ID Aa Ab Ac no.match Ba Bb Bc
# 1: Person.A  0  0  0        0  0 NA  0
# 2: Person.B  1  1  1        1 NA  1  1
# 3: Person.C  2  2  2        2  2  2  2
# 4: Person.D  1  1  2        2  1  2  2
# 5: Person.E  1  1  1        1  1  1  1
# 6: Person.F  1  1  1        2 NA NA NA
  • setDT() converts df1 to a data.table by reference. setDT()通过引用将df1转换为data.table

  • setDT(df1)[df2, on = "ID"] performs a join. setDT(df1)[df2, on = "ID"]执行setDT(df1)[df2, on = "ID"] For each row of df2 , we find the matching rows in df1 and extract the columns corresponding to matching rows.. 对于df2每一行,我们在df1找到匹配的行,并提取与匹配的行相对应的列。

  • On the matching rows, we update columns in cols1 and add new columns in cols2 by reference using the := operator. 在匹配的行,我们更新cols1添加新列cols2使用引用 :=操作符。 For updating columns, we extract the columns specified in cols1 and cols2 and replace NA s with the function foo() . 为了更新列,我们提取在cols1cols2指定的列,并将NA替换为函数foo() For adding columns, we simply pull the columns cols2 , using mget() . 为了添加列,我们只需使用mget()拉列cols2 We concatenate the two lists using c() . 我们使用c()连接两个列表。

If you're interested, have a look at the HTML vignettes to learn more. 如果您有兴趣,请查看HTML小插图以了解更多信息。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:替换列列表中的值列表 - Pandas: replace list of values from list of columns 根据 Python 中的条件更新和替换列中的值 - Update and replace values in columns based on conditions in Python 如何用 python 列表中的数据替换 DataFrame 列中的值? - How to replace values inside DataFrame columns with data from a python list? 如何根据 pandas 中多列的条件替换列中的值 - How to replace values in a column based on conditions from multiple columns in pandas 如何根据 pandas 中两个不同列的条件将值从一列复制到另一列? - how to copy values from one column into another column based on conditions of two different columns in pandas? Pandas:使用基于两列的另一个数据帧中的值替换一个数据帧中的值 - Pandas: replace values in one dataframe with values from another dataframe based on two columns 在Python中将列中的值连接到列表 - Concatenating values from columns to a list in Python 根据另一列(条件)的值替换缺失值 NAN - Replace the missing value NAN based on values of another columns (conditions) Pandas:根据另一列中的索引列表添加一列来自其他列的值列表 - Pandas: Add a column of list of values from other columns based on an index list in another column Python:如何使用列表推导将一个列表中的 None 替换为另一个列表中的值? - Python: How to use a list comprehension to replace Nones in one list with values from another list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM