[英]Read text-file into program as dictionary
使用 Python 3。
我必須編寫一個帶有一個參數(字符串)的函數,並且必須從包含序列(鍵)和序列(值)的名稱的 txt 文件中返回一個字典。 鍵和值都必須是字符串。
文本文件:
Read1 GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC
Read2 CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG
Read4 TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG
Read3 GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT
Read5 CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
Read6 TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT
我已經走了這么遠,但我想我錯過了一些東西,我不知道我在這里的工作是否正確。 我已經標記了我懷疑它是否正確的行(用#)。
def read_data(file_name):
input_file=open(sequencing_reads.txt)
#sequence_dict={}
for line in input_file:
#x=line.split(",")
#return sequence_dict
input_file.close()
我知道它必須返回包含以下內容的字典:
{'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC',
'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG',
'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG',
'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT',
'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC',
'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT'}
你能幫我填補空白嗎?
編輯:我需要保持簡單,所以請不要導入包和智能技巧:-)
編輯2:
我也試過這個:
with open('sequencing_reads.txt', 'r') as document:
answer = {}
for line in document:
line = line.split()
if not line:
continue
answer[line[0]] = line[1:]
print(answer)
輸出是:
{'Read1': ['GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC'], 'Read2': ['CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG'], 'Read4': ['TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG'], 'Read3': ['GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT'], 'Read5': ['CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC'], 'Read6': ['TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT']}
如何擺脫我的序列周圍的“[]”?
編輯4:
def read_data(file_name):
with open("sequencing_reads.txt", "r") as document:
answer = {}
for line in document:
line = line.split()
if not line:
continue
answer[line[0]] = line[1:]
final_answer = {a:b[0] for a, b in answer.items()}
final_answer = read_data("sequencing_reads.txt")
print(final_answer)
印刷:
None
你可以試試這個:
import re
def read_data(file_name):
data = open(file_name).read()
keys = [filter(lambda x:bool(x), i)[0][1:-1] for i in re.findall("{(.*?)\:|(?<=,\n\s)(.*?)\:", data)]
values = [filter(lambda x:bool(x), i)[0][1:-1] for i in re.findall('(?<=:\s)(.*?)(?=,\n)|(?<=\s)(.*?)(?=})', data)]
final_data = {a:b for a, b in zip(keys, values)}
return final_data
輸出:
{'Read1': 'GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC', 'Read3': 'GTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGTCGTGAACACATCAGT', 'Read2': 'CTTTACCCGGAAGAGCGGGACGCTGCCCTGCGCGATTCCAGGCTCCCCACGGG', 'Read5': 'CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC', 'Read4': 'TGCGAGGGAAGTGAAGTATTTGACCCTTTACCCGGAAGAGCG', 'Read6': "'Read6': 'TGACAGTAGATCTCGTCCAGACCCCTAGCTGGTACGTCTTCAGTAGAAAATTGTTTTTTTCTTCCAAGAGGTCGGAGT"}
編輯:
import ast
def read_data(file_name):
final_data = ast.literal_eval(open(file_name).read())
return final_data
編輯 1:關於刪除括號,只需通過索引訪問值:
final_answer = {a:b[0] for a, b in answer.items()}
print(final_answer)
如果您在打印read_data
返回的值時遇到問題,您可以嘗試以下操作:
answer = read_data("the_file.txt")
print(answer)
編輯3:
def read_data(file_name):
with open(file_name, "r") as document:
answer = {}
for line in document:
line = line.split()
if line:
answer[line[0]] = line[1:]
return {a:b[0] for a, b in answer.items()}
print(read_data("sequencing_reads.txt"))
您的文件"sequencing_reads.txt"
是 json 格式。 您可以使用 python 標准庫中的 json 模塊輕松地將您的內容加載到字典中。
import json
with open("sequencing_reads.txt") as f:
sequence_dict = json.load(f)
首先,如果您的文件是 json 格式並在單獨的行中,您應該將其讀入一行,可能是這樣的:
def read_data(file_name):
lines = open(file_name).readlines()
merged_line = " ".join([line.strip() for line in lines])
其次,json.loads 需要對字符串使用雙引號(例如:{"a":"a"})。 如果您使用單引號(如您的示例中所示),則可能會出現錯誤。 所以你可以這樣做:
# 1,use json.loads, but replace first
import json
merged_line = merged_line.replace("'", '"')
data = json.loads(merged_line)
# 2,use ast
import ast
data = ast.literal_eval(merged_line)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.