简体   繁体   English

python Unicode字符串拆分/到json转换

[英]python Unicode string splitting/ to json conversion

I have a bunch of Unicode strings and I am looking for a quickest way to extract values from the string. 我有一堆Unicode字符串,并且我正在寻找从字符串中提取值的最快方法。

In [161]: data1 = u'NAME: abc\nSchool Name: CD\n________________\nENG: B   \nMat: B   '
In [162]: print data1
NAME: abc
School Name: CD
________________
ENG: B   
Mat: B 

Alternatively, is there a way to process it using json in python. 另外,有一种方法可以在python中使用json处理它。

If you are trying to get the data for NAME:, School Name:, etc. 如果您要获取NAME :、学校名称:等数据。

I would use a dictionary and split the data to insert it. 我将使用字典并拆分数据以将其插入。 So the code would look something like this 所以代码看起来像这样

data=data1.split("\n")
info={}

for d in data:
    info[d.split(":")[0]]=d.split(":")[1]

Then you can reference info for the data like so: 然后,您可以像这样引用数据的信息:

info["NAME"], info["School Name"]

etc 等等

EDIT: no for loop 编辑:没有for循环

You could, based on which field you are looking for, do this: 您可以根据要查找的字段来执行此操作:

info=data1.split(field)[1].split("\n")[0]

Similarly to @QuinnFTW, I would create a dict, but I prefer dict comprehensions to for loops. 与@QuinnFTW相似,我将创建一个dict,但是我更喜欢dict理解而不是for循环。 Once you have the data in a dict, you can convert to JSON easily with json.dumps : 将数据放入字典后,您可以使用json.dumps轻松转换为JSON:

data1 = u'NAME: abc\nSchool Name: CD\n________________\nENG: B   \nMat: B   '

data1 = dict((item.strip()
              for item in line.split(':',1))
             for line in data1.splitlines()
             if ':' in line)

from pprint import pprint
pprint(data1)

import json
print json.dumps(data1)

Result: 结果:

{u'ENG': u'B', u'Mat': u'B', u'NAME': u'abc', u'School Name': u'CD'}
{"Mat": "B", "NAME": "abc", "School Name": "CD", "ENG": "B"}

I have the following solution working now. 我有以下解决方案正在工作。

data1 = u'NAME: abc\nSchool Name: CD\n________________\nENG: B   \nMat: B   '

import re
from itertools import izip
data2 = re.split(r'[:\n________________]+',data1)
i = iter(data2)
ans = dict(izip(i, i))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM