简体   繁体   English

对列表中的 dataframe 的列执行操作 (R)

[英]Perform an operation over a column of a dataframe in a list (R)

I am trying to perform an operation in a column of a dataframe that is inside of a list.我正在尝试在列表内的 dataframe 的列中执行操作。

This is the data frame inside of my list这是我列表中的数据框

在此处输入图像描述

> dput(wtr_complete[[1]])
structure(list(date = c("2010-03-02T00:00:00", "2010-03-03T00:00:00", 
"2010-03-04T00:00:00", "2010-03-05T00:00:00", "2010-03-06T00:00:00", 
"2010-03-07T00:00:00", "2010-03-08T00:00:00", "2010-03-09T00:00:00", 
"2010-03-10T00:00:00", "2010-03-11T00:00:00", "2010-03-12T00:00:00", 
"2010-03-13T00:00:00", "2010-03-14T00:00:00", "2010-03-15T00:00:00", 
"2010-03-16T00:00:00", "2010-03-17T00:00:00", "2010-03-18T00:00:00", 
"2010-03-19T00:00:00", "2010-03-20T00:00:00", "2010-03-21T00:00:00", 
"2010-03-22T00:00:00", "2010-03-23T00:00:00", "2010-03-24T00:00:00", 
"2010-03-25T00:00:00", "2010-03-26T00:00:00", "2011-01-01T00:00:00", 
"2011-01-02T00:00:00", "2011-01-03T00:00:00", "2011-01-04T00:00:00", 
"2011-01-05T00:00:00", "2011-01-06T00:00:00", "2011-01-07T00:00:00", 
"2011-01-08T00:00:00", "2011-01-09T00:00:00", "2011-01-10T00:00:00", 
"2011-01-11T00:00:00", "2011-01-12T00:00:00", "2011-01-13T00:00:00", 
"2011-01-14T00:00:00", "2011-01-15T00:00:00", "2011-01-16T00:00:00", 
"2011-01-17T00:00:00", "2011-01-18T00:00:00", "2011-01-19T00:00:00", 
"2011-01-20T00:00:00", "2011-01-21T00:00:00", "2011-01-22T00:00:00", 
"2011-01-23T00:00:00", "2011-01-24T00:00:00", "2011-01-25T00:00:00", 
"2012-01-01T00:00:00", "2012-01-02T00:00:00", "2012-01-03T00:00:00", 
"2012-01-04T00:00:00", "2012-01-05T00:00:00", "2012-01-06T00:00:00", 
"2012-01-07T00:00:00", "2012-01-08T00:00:00", "2012-01-09T00:00:00", 
"2012-01-10T00:00:00", "2012-01-11T00:00:00", "2012-01-12T00:00:00", 
"2012-01-13T00:00:00", "2012-01-14T00:00:00", "2012-01-15T00:00:00", 
"2012-01-16T00:00:00", "2012-01-17T00:00:00", "2012-01-18T00:00:00", 
"2012-01-19T00:00:00", "2012-01-20T00:00:00", "2012-01-21T00:00:00", 
"2012-01-22T00:00:00", "2012-01-23T00:00:00", "2012-01-24T00:00:00", 
"2012-01-25T00:00:00", "2013-01-01T00:00:00", "2013-01-02T00:00:00", 
"2013-01-03T00:00:00", "2013-01-04T00:00:00", "2013-01-05T00:00:00", 
"2013-01-06T00:00:00", "2013-01-07T00:00:00", "2013-01-08T00:00:00", 
"2013-01-09T00:00:00", "2013-01-10T00:00:00", "2013-01-11T00:00:00", 
"2013-01-12T00:00:00", "2013-01-13T00:00:00", "2013-01-14T00:00:00", 
"2013-01-15T00:00:00", "2013-01-16T00:00:00", "2013-01-17T00:00:00", 
"2013-01-18T00:00:00", "2013-01-19T00:00:00", "2013-01-20T00:00:00", 
"2013-01-21T00:00:00", "2013-01-22T00:00:00", "2013-01-23T00:00:00", 
"2013-01-24T00:00:00", "2013-01-25T00:00:00", "2014-01-01T00:00:00", 
"2014-01-02T00:00:00", "2014-01-03T00:00:00", "2014-01-04T00:00:00", 
"2014-01-05T00:00:00", "2014-01-06T00:00:00", "2014-01-07T00:00:00", 
"2014-01-08T00:00:00", "2014-01-09T00:00:00", "2014-01-10T00:00:00", 
"2014-01-11T00:00:00", "2014-01-12T00:00:00", "2014-01-13T00:00:00", 
"2014-01-14T00:00:00", "2014-01-15T00:00:00", "2014-01-16T00:00:00", 
"2014-01-17T00:00:00", "2014-01-18T00:00:00", "2014-01-19T00:00:00", 
"2014-01-20T00:00:00", "2014-01-21T00:00:00", "2014-01-22T00:00:00", 
"2014-01-23T00:00:00", "2014-01-24T00:00:00", "2014-01-25T00:00:00"
), station = c("GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156", "GHCND:USW00053156", 
"GHCND:USW00053156", "GHCND:USW00053156"), value = c(0L, 0L, 
0L, 0L, 0L, 64L, 26L, 21L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 161L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 8L, 8L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 
-125L), class = c("tbl_df", "tbl", "data.frame"))

and the operation consists in to make a substring of the name in the second column of the data frame in a new column (number 4).并且该操作包括在新列(编号 4)中创建数据框第二列中名称的 substring。 I am using the following code:我正在使用以下代码:

wtr_complete[[1]][4] <-  substring(wtr_complete[[1]][2],7)

but it is not working correctly as I am getting in column 4:但当我进入第 4 列时,它无法正常工作:

在此处输入图像描述

Any idea on how to perform an operation over the column of the data frame in the list?关于如何对列表中数据框的列执行操作的任何想法?

This should work for you:这应该适合你:

df <- wtr_complete[[1]]
df$newcol <-  substring(df$station,7)
wtr_complete[[1]] <- df

directly from the list直接从列表中

wtr_complete[[1]]$newcol <- substring(wtr_complete[[1]]$station,7)

Thanks for putting together a reproducible example.感谢您整理了一个可重复的示例。 You're very close to the solution you want.您非常接近您想要的解决方案。

One thing to pick up when learning R is the difference between referencing the values you want to work with and referencing the thing that contains those values.学习 R 时要了解的一件事是引用您要使用的值和引用包含这些值的事物之间的区别。 This is one of those cases.这是其中一种情况。

When you write wtr_complete[[1]][2] , R returns a data.frame with just column two included.当您编写wtr_complete[[1]][2]时, R 返回一个仅包含第二列的data.frame When you use a slightly different syntax, wtr_complete[[1]][,2] , R returns a character vector with the actual values you want to work with.当您使用稍微不同的语法时, wtr_complete[[1]][,2] , R 返回一个字符向量,其中包含您要使用的实际值 The difference is just in that comma.区别就在于那个逗号。 The comma is R's syntax for subsetting data.frame s and [,2] means: "All the values in just column 2".逗号是 R 用于子集data.frame的语法, [,2]表示:“列 2 中的所有值”。

The strange output you get is because you're passing an entire data.frame to substring , rather than just a character vector which is what it asks for.您得到的奇怪的 output 是因为您将整个data.frame传递给substring ,而不仅仅是它所要求的字符向量 In turn, what substring does is convert your data.frame to a character vector before doing the substring operation.反过来, substring所做的是在执行 substring 操作之前将您的data.frame转换为字符向量。 See that this produces a character vector of length 1 with all the values smushed together.看到这会产生一个长度为1的字符向量,其中所有值都混合在一起。

as.character(wtr_complete[[1]][2])
[1] "c(\"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\", \"GHCND:USW00053156\",

Some pretty strange output, right?有些很奇怪的output,对吧?

What you want to do instead, is call substring with the values :您想要做的是使用以下值调用substring

wtr_complete[[1]][4] <- substring(wtr_complete[[1]][,2], 7)

You should get a result like:您应该得到如下结果:

                   date           station value          V4
1   2010-03-02T00:00:00 GHCND:USW00053156     0 USW00053156

Note: You'll see that this gives your new column a name of "V4".注意:您会看到这为您的新列提供了“V4”的名称。 An overall better way to run this operation is to give the new column a name and also reference column two by name, which is much safer:运行此操作的整体更好的方法是为新列命名并按名称引用第二列,这样安全:

wtr_complete[[1]]$mynewcol <- substring(wtr_complete[[1]]$station, 7)

Neither wtr_complete[[1]][2] nor wtr_complete[[1]][,2] are character vectors, wtr_complete[[1]][2]wtr_complete[[1]][,2]都不是字符向量,

You could use:你可以使用:

wtr_complete[[1]]$newcol <-  substring(wtr_complete[[1]][[2]],7)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM