简体   繁体   中英

How to open multiple files from a folder using the names in tsv file in python?

I have a metadata.tsv file like, eg.,

id   group
SRR01   1
SRR02   1
SRR04   2
SRR05   2

in a directory, i have the following files: SRR01_1.fq, SRR01_2.fq, SRR02_1.fq, SRR02_2.fq, SRR04_1.fq, SRR04_2.fq, SRR05_1.fq, SRR05_2.fq

Now, I need to input the files by their groups and then output the result in separate folder.

I have tried using panda and glob.

Maybe something along these lines:

import pandas

metadata = pandas.read_csv('metadata.tsv', sep= '\t')

rule all:
    input:
        expand('{group}.txt', group= metadata.group)

rule one:
    input:
        ids= lambda wc: metadata[metadata.group == int(wc.group)].id,
    output:
        out= '{group}.txt',
    shell:
        r"""
        echo {input.ids} > {output.out}
        """

You use a lambda function as input to query the dataframe metadata and return the id's corresponding to each {group}. Take care that the column group is assuming to be of type integer here.

If you're trying to use a 'function' in all the files in that directory you could use the follwing code:


import os

for path, dir, files in os.walk(os.getcwd()#Or your directory):
    for file in files:
       pass #Your code here!

path -> gives you the path for that file(does not include the file)

dir -> gives you the directory of that file(not really usefull for you)

files -> gives you a list of the files it the directory

note : The walk method is gonna 'walk' in every subfolder in your directory. so if you have more directories in that folder have in mind it will include the files in there too

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM