简体   繁体   English

R,rowMean按data.frame中的列

[英]R, rowMeans by Column in data.frame

I have a csv that I've read in and is now a data.frame within R. My file name is MyRData007. 我有一个已读入的csv,现在是R中的data.frame。我的文件名为MyRData007。 My header information goes until row 5 (main column headers are on row 4). 我的标题信息一直持续到第5行(主列标题位于第4行)。 My ID is in column A. I simply need to create two separate rowMeans for each ID. 我的ID在A列中。我只需要为每个ID创建两个单独的rowMeans。 The data is in rows 5-147. 数据在第5-147行中。 For the first mean it's columns 4-15; 第一个意思是第4-15列; the second mean it's for columns 6-21. 第二个意思是第6-21列。 Ultimately I'll should have a new variable with a mean for each of the 143 rows. 最终,我应该为143行中的每行都添加一个平均值。 This is what I tried: 这是我尝试的:

> mRNA<-rowMeans(MyRData007)[5:147,(4:15)]
> Protein<-rowMeans(MyRData007)[5:147,(16:21)]

But I get an error? 但是我得到一个错误?

Error in rowMeans(MyRData007) : 'x' must be numeric
df <- read.table(text='this is a header
                 this is another header
                 this too is one
                 and this is also
                 id code status value
                 1 2 3 4
                 2 32 43 23
                 3 3 43 32
                 4 232 323 55')
df
    V1   V2      V3     V4
1 this   is       a header
2 this   is another header
3 this  too      is    one
4  and this      is   also
5   id code  status  value
6    1    2       3      4
7    2   32      43     23
8    3    3      43     32
9    4  232     323     55

So when you try to call rowMeans you get an error: 因此,当您尝试调用rowMeans ,会出现错误:

rowMeans(df)
Error in rowMeans(df) : 'x' must be numeric

You get this error because you are trying to get the mean of non-numeric values, which makes no sense. 之所以会出现此错误,是因为您试图获取非数字值的平均值,这没有任何意义。 Your attempts to subset the data didn't work because you put the brackets outside the call to rowMeans , which tells it to subset the output of rowMeans , not the data going in. 您尝试对数据进行子集化的操作无效,因为您将方括号放在对rowMeans的调用rowMeans ,这告诉它对rowMeans的输出进行子集rowMeans ,而不是对输入的数据进行子集化。

The fundamental problem is that you can't have headers information in a R data.frame . 基本的问题是,R data.frame不能包含标题信息。 All data in a column of a data frame must be the same type, so if you have characters in some rows, you cant have numbers in others. 数据框的一列中的所有数据都必须为同一类型,因此,如果某些行中包含字符,则其他行中不能包含数字。

How can you fix this? 您该如何解决?

Read in your data with read.table using the skip = 4 argument. 使用skip = 4参数使用read.table读入数据。 This will make it skip over the header information rows to generate a data frame with only your data. 这将使其跳过标题信息行,以仅包含您的数据来生成数据帧。 If your file is a .csv you'll also need to specify sep=',' and header=T : 如果文件是.csv ,则还需要指定sep=','header=T

df2 <- read.table(text='this is a header
                 this is another header
                 this too is one
                 and this is also
                 id code status value
                 1 2 3 4
                 2 32 43 23
                 3 3 43 32
                 4 232 323 55', skip = 4, header = T)
rowMeans(df2)
[1]   2.50  25.00  20.25 153.50

read.csv is just a wrapper for read.table and using it is the same as using read.table with the following options: read.csv只是read.table的包装,使用它与使用read.table以及以下选项相同:

read.table(file, header = TRUE, sep = ",", fill = TRUE)

Generally, it's better to use read.table since it gives you more control. 通常,最好使用read.table因为它可以为您提供更多控制权。 The most important example being to set stringsAsFactors = FALSE to prevent strings from being converted to factors (an extremely annoying default). 最重要的示例是将stringsAsFactors = FALSE设置stringsAsFactors = FALSE以防止将字符串转换为factors (一个非常烦人的默认值)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM