I have a metadata.tsv file like, eg.,
id group
SRR01 1
SRR02 1
SRR04 2
SRR05 2
in a directory, i have the following files: SRR01_1.fq, SRR01_2.fq, SRR02_1.fq, SRR02_2.fq, SRR04_1.fq, SRR04_2.fq, SRR05_1.fq, SRR05_2.fq
Now, I need to input the files by their groups and then output the result in separate folder.
I have tried using panda and glob.
Maybe something along these lines:
import pandas
metadata = pandas.read_csv('metadata.tsv', sep= '\t')
rule all:
input:
expand('{group}.txt', group= metadata.group)
rule one:
input:
ids= lambda wc: metadata[metadata.group == int(wc.group)].id,
output:
out= '{group}.txt',
shell:
r"""
echo {input.ids} > {output.out}
"""
You use a lambda function as input to query the dataframe metadata
and return the id's corresponding to each {group}.
Take care that the column group
is assuming to be of type integer here.
If you're trying to use a 'function' in all the files in that directory you could use the follwing code:
import os
for path, dir, files in os.walk(os.getcwd()#Or your directory):
for file in files:
pass #Your code here!
path -> gives you the path for that file(does not include the file)
dir -> gives you the directory of that file(not really usefull for you)
files -> gives you a list of the files it the directory
note : The walk method is gonna 'walk' in every subfolder in your directory. so if you have more directories in that folder have in mind it will include the files in there too
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.