简体   繁体   中英

Taking average from multiple text files in R

I have a directory on my computer filled with ~ 1000 .txt files. Each file looks like this (no NAs):

`head` 1.txt
        M40_A  M40_B   M40_C   M41
K00844  28     20      27      23
K00845  668    649     737     838
K01810  2171   2264    2140    2221

`head` 2.txt
        M40_A  M40_B   M40_C   M41
K00844  23     21      22      11
K00845  649    628     708     837
K01810  2121   2326    2162    2255

For each file, all row names and column names are the same. What I am looking to do is the following: create 1 final data frame in which I have the average (rounded up) computed for each K00XXX number (from all .txt files) for each condition (M40_A, M40_B, M40C, M41). For example, my final df would look like this:

`final_df`
        M40_A  M40_B   M40_C   M41
K00844  26     21      25      17
K00845  659    639     723     838
K01810  2146   2295    2151    2238

Where, for example, the value 26 is the average of column M40_A, row K00844 of 1.txt and 2.txt ((28 + 23)/2 = 26). I have searched this site and have found the exact same post here: Average multiple csv files into 1 averaged file in r however, multiple attempts of me trying to execute the code keeps giving me errors. For example:

`txts <- lapply(list.files(pattern="*.txt"), read.csv)'

Reads all of my files into a list, into a weird configuration. This is my result:

`> txts[1]`

[[1] M40_A.M40_B.M40_C.M41
1 K00844\t28\t20\t27\t23\
2 K00845\t668\t649\t737\t838\
3 K01810\t2171\t2264\t2140\t2221\

and when I execute the second code:

`Reduce("+", txts) / length(txts)

it gives me: Warning message: In Ops.factor(left, right) : '+' not meaningful for factors. Not to mention, this is not taking the average of all the .txts files as the R documentation says that Reduce is to combine the elements of a given vector.

So, I think there has to be a different way to be able to make this work. Any help or insight into how to come up with my final_df would really help so much!

Well, this is clearly not what you wanted, but it may be what you needed:

Here is a python program that provides the output that you have requested:

import sys

allfiledata = []
filenames = sys.argv[1:]
for filename in filenames:
  rows = []
  with open (filename, "r") as filehandle:
    for line in filehandle:
      rows.append( line.split() )

    allfiledata.append( rows ) 


print " ".join(allfiledata[0][0])  # column headers
for i1,columns in enumerate(allfiledata[0][1:]):
  print columns[0],
  for i2,value in enumerate(columns[1:]):
    total = 0 
    for filedata in allfiledata:
      total = int(filedata[i1+1][i2+1]) + total
    print int(float(total)/len(allfiledata)+0.5), 
  print

You can execute it as follows ( assuming you put it a file called avg.py):

python avg.py *.txt

The above command will average all the *.txt files in the current directory. HTH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM