[英]Python regular expression for a string and match them into a dictonnary
I have three files in a directory and I wanted them to be matched with a list of strings to dictionary. 我在一个目录中有三个文件,我希望它们与一个字符串列表匹配到字典。
The files in dir
looks like following, dir
的文件如下所示,
DB_ABC_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz
DB_ABC_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz
DB_DEF_S1_001_MM_R1.faq.gz
DB_DEF_S1_001_MM_R2.faq.gz
The list
has part of the filename as, 该
list
包含部分文件名,
ABC
DEF
So here is what I tried, 所以这就是我的尝试,
import os
import re
dir='/user/home/files'
list='/user/home/list'
samp1 = {}
samp2 = {}
FH_sample = open(list, 'r')
for line in FH_sample:
samp1[line.strip().split('\n')[0]] =[]
samp2[line.strip().split('\n')[0]] =[]
FH_sample.close()
for file in os.listdir(dir):
m1 =re.search('(.*)_R1', file)
m2 = re.search('(.*)_R2', file)
if m1 and m1.group(1) in samp1:
samp1[m1.group(1)].append(file)
if m2 and m2.group(1) in samp2:
samp2[m2.group(1)].append(file)
I wanted the above script to find the matches from m1 and m2 and collect them in dictionaries samp1
and samp2
. 我希望上面的脚本能够找到m1和m2的匹配,并在字典
samp1
和samp2
收集它们。 But the above script is not finding the matches, within the if loop
. 但是上面的脚本没有在
if loop
找到匹配项。 Now the samp1
and samp2
are empty. 现在
samp1
和samp2
都是空的。
This is what the output should look like for samp1
and samp2
: 这是
samp1
和samp2
的输出应该是什么样子:
{'ABC': [DB_ABC_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz, DB_ABC_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz], 'DEF': [DB_DEF_S1_001_MM_R1.faq.gz, DB_DEF_S1_001_MM_R2.faq.gz]}
Any help would be greatly appreciated 任何帮助将不胜感激
A lot of this code you probably don't need. 很多这些代码你可能不需要。 You could just see if the substring that you have from
list
is in
dir
. 您可以查看
list
的子字符串是否in
dir
。
The code below reads in the data as lists. 下面的代码将数据作为列表读入。 You seem to have already done this, so it will simply be a matter of replacing
files
with the file names you read in from dir
and replacing st
with the substrings from list
(which you shouldn't use as a variable name since it is actually used for something else in Python). 你似乎已经这样做了,所以它只是用你从
dir
读入的文件名替换files
并用list
的子串替换st
(你不应该将它用作变量名,因为它实际上是用于Python中的其他内容)。
files = ["BSSE_QGF_1987_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz",
"BSSE_QGF_1967_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz",
"BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R1_001_MM_1.faq.gz",
"BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R2_001_MM_1.faq.gz"]
my_strings = ["MOHUA", "MSJLF"]
res = {s: [] for s in my_strings}
for k in my_strings:
for file in files:
if k in file:
res[k].append(file)
print(res)
You can pass the python script a dict and provide id_list and then add id_list as dict keys and append the fastqs if the dict key is in the fastq_filename: 您可以将python脚本传递给dict并提供id_list,然后将id_list添加为dict键,如果dict键位于fastq_filename中,则附加fastqs:
import os
import sys
dir_path = sys.argv[1]
fastqs=[]
for x in os.listdir(dir_path):
if x.endswith(".faq.gz"):
fastqs.append(x)
id_list = ['MOHUA', 'MSJLF']
sample_dict = dict((sample,[]) for sample in id_list)
print(sample_dict)
for k in sample_dict:
for z in fastqs:
if k in z:
sample_dict[k].append(z)
print(sample_dict)
to run: 跑步:
python3.6 fq_finder.py /path/to/fastqs
output from above to show what is going on: 从上面输出以显示正在发生的事情:
{'MOHUA': [], 'MSJLF': []} # first print creates dict with empty list as vals for keys
{'MOHUA': ['BSSE_QGF_1987_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz', 'BSSE_QGF_1967_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz'], 'MSJLF': ['BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R2_001_MM_1.faq.gz', 'BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R1_001_MM_1.faq.gz']}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.