[英]How do I split the first line of input file and store them as a dictionary in python?
The first line of my input file looks like this:我的输入文件的第一行如下所示:
<doc id="12" url="http://en.wikipedia.org/wiki?curid=12" title="Anarchism">
I want store them as key-value pair like this in python:我想在 python 中将它们存储为这样的键值对:
{doc_id: 12, url: http://en.wikipedia.org/wiki?curid=12, title: Anarchism}
Here is my code:这是我的代码:
infile=open('wiki_00').readline().rstrip()
infile.split()[1:]
output looks like this:输出如下所示:
['id="12"',
'url="http://en.wikipedia.org/wiki?curid=12"',
'title="Anarchism">']
But I would like the "", <> removed and id to be stored as type int但我希望将 "", <> 删除并将 id 存储为 int 类型
Don't do line[1:]
to strip away the brackets.不要用line[1:]
去掉括号。 Use the strip
method: line.strip(' <>')
will remove all whitespace and <> characters from the ends of the line.使用strip
方法: line.strip(' <>')
将从行尾删除所有空格和 <> 字符。
Something like this will do what I think you want.像这样的事情会做我认为你想要的。 You may want to add error handling.您可能想要添加错误处理。
def turn_line_into_dict(line):
# remove the brackets and tag name
line = line.strip(' <>')
first_space_idx = line.find(' ')
line_without_tag = line[first_space_idx+1:]
attr_list = line_without_tag.split(' ')
d = {}
for attr_str in attr_list :
key,value = attr_str.split('=', 1) # only search for first occurrence, so an '=' in the url doesn't screw this up
d[key] = value.strip('"\'') # remove quotes and let the dict figure out the type
return d
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.