简体   繁体   English

如何在匹配R中的其他列时将特定值从一个数据列复制到另一个数据列?

[英]How to copy specific values from one data column to another while matching other columns in R?

I've searched a number of places (stackoverflow, r-blogger, etc), but haven't quite found a good option for doing this in R. Hopefully someone has some ideas. 我搜索了很多地方(stackoverflow,r-blogger等),但还没有找到一个很好的选择在R中这样做。希望有人有一些想法。

I have a set of environmental sampling data. 我有一套环境采样数据。 The data includes a variety of fields (visit date, region, location, sample medium, sample component, result, etc.). 数据包括各种字段(访问日期,区域,位置,样本介质,样本组件,结果等)。

Here's a subset of the pertinent fields. 这是相关领域的一个子集。 This is where I start... 这是我开始的地方......

visit_date   region    location     media      component     result
1990-08-20   LAKE      555723       water       Mg            *Nondetect
1999-07-01   HILL      432422       water       Ca            3.2
2010-09-12   LAKE      555723       water       pH            6.8
2010-09-12   LAKE      555723       water       Mg            2.1
2010-09-12   HILL      432423       water       pH            7.2
2010-09-12   HILL      432423       water       N             0.8
2010-09-12   HILL      432423       water       NH4          112

What I hope to reach is a table/dataframe like this: 我希望达到的是这样的表/数据帧:

visit_date   region    location     media      component     result        pH
1990-08-20   LAKE      555723       water       Mg            *Nondetect  *Not recorded
1999-07-01   HILL      432422       water       Ca            3.2         *Not recorded
2010-09-12   LAKE      555723       water       pH            6.8         6.8
2010-09-12   LAKE      555723       water       Mg            2.1         6.8
2010-09-12   HILL      432423       water       pH            7.2         7.2
2010-09-12   HILL      432423       water       N             0.8         7.2
2010-09-12   HILL      432423       water       NH4          112          7.2

I attempted to use the method here -- R finding rows of a data frame where certain columns match those of another -- but unfortunately didn't get to the result I wanted. 我试图在这里使用这个方法--R找到一些数据帧的行,其中某些列与另一列匹配 - 但遗憾的是没有得到我想要的结果。 Instead the pH column was either my pre-populated value -999 or NA and not the pH value for that particular visit date if it was collected. 相反,pH柱是我预先填充的值-999NA而不是如果收集的那个特定访问日期的pH值。 Since the result data set is around 500k records, I'm using unique(tResult$pH) to determine the values of the pH column. 由于结果数据集大约是500k记录,我使用unique(tResult$pH)来确定pH柱的值。

Here's that attempt. 这是尝试。 res is the original result data.frame and component would be the pH result subset (the pH sample results from the main results table). res是原始结果data.frame和component将是pH结果子集(pH样本来自主要结果表)。

keys <- c("region", "location", "visit_date", "media")

tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)

tResults[tComponent, pH>0]

I've attempted using match , merge , and within on the original data frame without success. 我试图在原始数据框架上使用matchmergewithin而没有成功。 Since then I've generated a subset for the components (pH in this example) where I copied over the results column to a new "pH" column, thinking I could match the keys and update a new "pH" column in the main result set. 从那时起,我已经为组件(本例中的pH)生成了一个子集,我将结果列复制到新的“pH”列,认为我可以匹配键并更新主要结果中的新“pH”列组。

Since not all result values are numeric (with values like *Not recorded ) I attempted to use numerics like -888 or other values which could substitute so I could force at least the result and pH columns to be numeric. 由于并非所有结果值都是数字的(值为*Not recorded )我尝试使用数字如-888或其他可以替代的值,因此我可以强制至少结果和pH -888数字。 Aside from the dates which are POSIXct values, the remaining columns are character columns. 除了POSIXct值的日期之外,其余列是character列。 Original dataframe was created using StringsAsFactors=FALSE . 原始数据StringsAsFactors=FALSE是使用StringsAsFactors=FALSE创建的。

Once I can do this, I'll be able to generate similar columns for other components that can be used to populate and calculate other values for a given sample. 一旦我能够做到这一点,我将能够为其他组件生成类似的列,可用于填充和计算给定样本的其他值。 At least that's my goal. 至少这是我的目标。

So I'm stumped on this one. 所以我对这个很难过。 In my mind it should be easy but I'm certainly NOT seeing it! 在我看来它应该很容易但我肯定没有看到它!

Your help and ideas are certainly welcome and appreciated! 您的帮助和想法当然是受欢迎和赞赏!

#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))

library(data.table)
library(zoo) # locf function

setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
    visit_date region location media component     result phtem  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect    NA  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2    NA  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8   6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1    NA 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2   7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8    NA 7.2
7: 2010-09-12   HILL   432423 water       NH4        112    NA 7.2

# you can delete phtem if you don't need. #如果你不需要,可以删除。

Edit: 编辑:

library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1

   visit_date region location media component     result  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8 7.2
7: 2010-09-12   HILL   432423 water       NH4        112 7.2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于一列的部分与另一数据框中的值的匹配来填充R中的列 - How to fill columns in R based on matching parts of one column to values in another data frame 从 R dataframe 中的特定行将数据从一列复制到另一列 - Copy data from one column to another column from a specific row in R dataframe 将选定列中的值和一个数据框中的匹配行覆盖到另一个数据框中,R - Overwrite values from selected columns and matching rows from one data frame into another, R R:从一个数据框中提取行,基于列名匹配另一个数据框中的值 - R: Extract Rows from One Data Frame, Based on Column Names Matching Values from Another Data Frame (R) 如何根据 R 中的另一列和 ID 从一列复制粘贴值 - (R) How to copy paste values from one column based on another column and ID in R 如何通过匹配 R 中的其他两列将列中的值提取到数据框中 - How to extract values from a column into the dataframe by matching two other columns in R 如何通过匹配每个数据框中的 3 列将列从数据框中复制到另一列 - How to copy a column from a dataframe into another by matching 3 columns in each 如何根据其他列R中的值对一列中的值求和? - How to sum values in one column based on values in other columns R? R中的data.table:在匹配其他两个列值后,用相同列中的值替换列值 - data.table in R: Replace a column value with a value from same column after matching two other columns values 将唯一值从一列复制到另一列 - Copy unique values from one column to another R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM