将字节类型文件转换为 python 中的可用格式

Question

I have to read the table in below link(html page) into a dict() and then work on it.我必须将下面链接（html 页面）中的表格读入 dict()，然后处理它。 However, with the below code I gave, the table still looks clumsy and I do not understand from where to start working to make it a dictionary of codon sequence(eg AGU) to respective Amino Acid.但是，使用我给出的以下代码，该表仍然看起来很笨拙，我不明白从哪里开始工作以使其成为相应氨基酸的密码子序列（例如 AGU）字典。 Any way to make it look better?有什么办法让它看起来更好吗？ May be something like a DataFrame or any other suggestions?可能类似于 DataFrame 或任何其他建议？ Please help.请帮忙。 Thanks.谢谢。

link = "http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606&aa=1&style=N"
f = urllib.request.urlopen(link)
myfile = f.read()
s = myfile.decode()
s.strip(" ")

Answer 1

If you have looked at the page http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606&aa=1&style=N you would have noticed that it contains not just the codon sequence you want, but a lot of HTML around it.如果您查看页面http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606&aa=1&style=N您会注意到它不仅包含您想要的密码子序列，但它周围有很多HTML。 To extract just the codons, the best way is likely to use BeautifulSoup:要仅提取密码子，最好的方法可能是使用 BeautifulSoup：

from bs4 import BeautifulSoup
link = "http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606&aa=1&style=N"
f = urllib.request.urlopen(link)
myfile = f.read()
s = myfile.decode()
codons = BeautifulSoup(s).find('pre').text

Now you should probably process this string further to get the form you want - dict, list, dataframe, whatever.现在你可能应该进一步处理这个字符串以获得你想要的形式 - dict、list、dataframe 等等。 Assuming you just want a dict, since you mentioned a dictionary:假设您只想要一个字典，因为您提到了字典：

import re
codons_dict = { t[0]: t[1] for t in sorted(re.findall(r'(\w{3})\s+\w\s+(\S+)\s+\S+\s+[(]\d+[)]', codons)) }

将字节类型文件转换为 python 中的可用格式

问题描述

1 个解决方案

解决方案1
0 2020-04-22 09:34:44

将字节类型文件转换为 python 中的可用格式

问题描述

1 个解决方案

解决方案1 0 2020-04-22 09:34:44

解决方案1
0 2020-04-22 09:34:44