如何在匹配R中的其他列时将特定值从一个数据列复制到另一个数据列？

Question

I've searched a number of places (stackoverflow, r-blogger, etc), but haven't quite found a good option for doing this in R. Hopefully someone has some ideas. 我搜索了很多地方（stackoverflow，r-blogger等），但还没有找到一个很好的选择在R中这样做。希望有人有一些想法。

I have a set of environmental sampling data. 我有一套环境采样数据。 The data includes a variety of fields (visit date, region, location, sample medium, sample component, result, etc.). 数据包括各种字段（访问日期，区域，位置，样本介质，样本组件，结果等）。

Here's a subset of the pertinent fields. 这是相关领域的一个子集。 This is where I start... 这是我开始的地方......

visit_date   region    location     media      component     result
1990-08-20   LAKE      555723       water       Mg            *Nondetect
1999-07-01   HILL      432422       water       Ca            3.2
2010-09-12   LAKE      555723       water       pH            6.8
2010-09-12   LAKE      555723       water       Mg            2.1
2010-09-12   HILL      432423       water       pH            7.2
2010-09-12   HILL      432423       water       N             0.8
2010-09-12   HILL      432423       water       NH4          112

What I hope to reach is a table/dataframe like this: 我希望达到的是这样的表/数据帧：

visit_date   region    location     media      component     result        pH
1990-08-20   LAKE      555723       water       Mg            *Nondetect  *Not recorded
1999-07-01   HILL      432422       water       Ca            3.2         *Not recorded
2010-09-12   LAKE      555723       water       pH            6.8         6.8
2010-09-12   LAKE      555723       water       Mg            2.1         6.8
2010-09-12   HILL      432423       water       pH            7.2         7.2
2010-09-12   HILL      432423       water       N             0.8         7.2
2010-09-12   HILL      432423       water       NH4          112          7.2

I attempted to use the method here -- R finding rows of a data frame where certain columns match those of another -- but unfortunately didn't get to the result I wanted. 我试图在这里使用这个方法--R找到一些数据帧的行，其中某些列与另一列匹配 - 但遗憾的是没有得到我想要的结果。 Instead the pH column was either my pre-populated value -999 or NA and not the pH value for that particular visit date if it was collected. 相反，pH柱是我预先填充的值-999或NA而不是如果收集的那个特定访问日期的pH值。 Since the result data set is around 500k records, I'm using unique(tResult$pH) to determine the values of the pH column. 由于结果数据集大约是500k记录，我使用unique(tResult$pH)来确定pH柱的值。

Here's that attempt. 这是尝试。 res is the original result data.frame and component would be the pH result subset (the pH sample results from the main results table). res是原始结果data.frame和component将是pH结果子集（pH样本来自主要结果表）。

keys <- c("region", "location", "visit_date", "media")

tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)

tResults[tComponent, pH>0]

I've attempted using match , merge , and within on the original data frame without success. 我试图在原始数据框架上使用match ， merge和within而没有成功。 Since then I've generated a subset for the components (pH in this example) where I copied over the results column to a new "pH" column, thinking I could match the keys and update a new "pH" column in the main result set. 从那时起，我已经为组件（本例中的pH）生成了一个子集，我将结果列复制到新的“pH”列，认为我可以匹配键并更新主要结果中的新“pH”列组。

Since not all result values are numeric (with values like *Not recorded ) I attempted to use numerics like -888 or other values which could substitute so I could force at least the result and pH columns to be numeric. 由于并非所有结果值都是数字的（值为*Not recorded ）我尝试使用数字如-888或其他可以替代的值，因此我可以强制至少结果和pH -888数字。 Aside from the dates which are POSIXct values, the remaining columns are character columns. 除了POSIXct值的日期之外，其余列是character列。 Original dataframe was created using StringsAsFactors=FALSE . 原始数据StringsAsFactors=FALSE是使用StringsAsFactors=FALSE创建的。

Once I can do this, I'll be able to generate similar columns for other components that can be used to populate and calculate other values for a given sample. 一旦我能够做到这一点，我将能够为其他组件生成类似的列，可用于填充和计算给定样本的其他值。 At least that's my goal. 至少这是我的目标。

So I'm stumped on this one. 所以我对这个很难过。 In my mind it should be easy but I'm certainly NOT seeing it! 在我看来它应该很容易但我肯定没有看到它！

Your help and ideas are certainly welcome and appreciated! 您的帮助和想法当然是受欢迎和赞赏！

Answer 1

#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))

library(data.table)
library(zoo) # locf function

setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
    visit_date region location media component     result phtem  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect    NA  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2    NA  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8   6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1    NA 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2   7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8    NA 7.2
7: 2010-09-12   HILL   432423 water       NH4        112    NA 7.2

# you can delete phtem if you don't need. ＃如果你不需要，可以删除。

Edit: 编辑：

library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1

   visit_date region location media component     result  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8 7.2
7: 2010-09-12   HILL   432423 water       NH4        112 7.2

如何在匹配R中的其他列时将特定值从一个数据列复制到另一个数据列？

问题描述

1 个解决方案

解决方案1
4 2015-03-12 01:38:34

如何在匹配R中的其他列时将特定值从一个数据列复制到另一个数据列？

问题描述

1 个解决方案

解决方案1 4 2015-03-12 01:38:34

解决方案1
4 2015-03-12 01:38:34