Using Python 3.
I have to write a function that take one argument (a string), and must return a dictionary from a txt-file that contains names of the sequences (keys) and the sequences (values). Both keys and values must be strings.
The text-file:
Read1 GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC
Read2 CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG
Read4 TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG
Read3 GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT
Read5 CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
Read6 TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT
I've come this far, but I think I am missing something and I don't know if what my work here is correct. I've marked the lines (with #) where I'm in doubt whether it is correct or not.
def read_data(file_name):
input_file=open(sequencing_reads.txt)
#sequence_dict={}
for line in input_file:
#x=line.split(",")
#return sequence_dict
input_file.close()
I know it must return the dictionary with the following content:
{'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}
Can you help me fill out the gaps?
EDIT: I need to keep it simple so please no imports of packages and smart tricks :-)
EDIT 2:
I've tried this too:
with open('sequencing_reads.txt', 'r') as document:
answer = {}
for line in document:
line = line.split()
if not line:
continue
answer[line[0]] = line[1:]
print(answer)
The output is:
{'Read1': ['GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC'], 'Read2': ['CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG'], 'Read4': ['TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG'], 'Read3': ['GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT'], 'Read5': ['CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC'], 'Read6': ['TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT']}
How do I get rid of the "[ ]" around my sequences?
EDIT4:
def read_data(file_name):
with open("sequencing_reads.txt", "r") as document:
answer = {}
for line in document:
line = line.split()
if not line:
continue
answer[line[0]] = line[1:]
final_answer = {a:b[0] for a, b in answer.items()}
final_answer = read_data("sequencing_reads.txt")
print(final_answer)
prints:
None
You can try this:
import re
def read_data(file_name):
data = open(file_name).read()
keys = [filter(lambda x:bool(x), i)[0][1:-1] for i in re.findall("{(.*?)\:|(?<=,\n\s)(.*?)\:", data)]
values = [filter(lambda x:bool(x), i)[0][1:-1] for i in re.findall('(?<=:\s)(.*?)(?=,\n)|(?<=\s)(.*?)(?=})', data)]
final_data = {a:b for a, b in zip(keys, values)}
return final_data
Output:
{'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC', 'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT', 'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG', 'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC', 'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG', 'Read6': "'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT"}
Edit:
import ast
def read_data(file_name):
final_data = ast.literal_eval(open(file_name).read())
return final_data
Edit 1: Regarding the removal of the brackets, just access the value by indexing:
final_answer = {a:b[0] for a, b in answer.items()}
print(final_answer)
If you are having issues printing the value returned from read_data
, you can try this:
answer = read_data("the_file.txt")
print(answer)
Edit 3:
def read_data(file_name):
with open(file_name, "r") as document:
answer = {}
for line in document:
line = line.split()
if line:
answer[line[0]] = line[1:]
return {a:b[0] for a, b in answer.items()}
print(read_data("sequencing_reads.txt"))
Your file "sequencing_reads.txt"
is in json format. You can use the json module in the python standard library to load your content into a dictionary quite easily.
import json
with open("sequencing_reads.txt") as f:
sequence_dict = json.load(f)
Firstly, if your file is in json format and in separate lines, you should read it into a single line, maybe like this:
def read_data(file_name):
lines = open(file_name).readlines()
merged_line = " ".join([line.strip() for line in lines])
Secondly, The json.loads requires double quotes for the string(eg: {"a":"a"}). If you are using single quote(as in your example), there may be errors. So you can do like this:
# 1,use json.loads, but replace first
import json
merged_line = merged_line.replace("'", '"')
data = json.loads(merged_line)
# 2,use ast
import ast
data = ast.literal_eval(merged_line)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.