简体   繁体   English

如何正确拆分字符串以在 python 中创建字典?

[英]how to properly split string to create dictionary in python?

I have two Strings我有两个字符串

" TOP : Cotton + Embroidered ( 2 Mtr) \n BOTTOM : Cotton + Solid (2 Mtr) \n DUPATTA : Chiffon + Lace Work ( 2 Mtr) \n TYPE : Un Stitched\n COLOUR : Multi Colour \n CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA\nCountry of Origin: India" "上衣:棉 + 绣花 (2 Mtr) \n下装:棉 + 纯色 (2 Mtr) \n DUPATTA :雪纺 + 蕾丝 (2 Mtr) \n类型:未缝合\n颜色:多色 \n包含:1顶部有衬里 1 个底部和 1 个 DUPATTA\n原产国:印度"

and Second one is第二个是

"Top Fabric: Cotton Cambric + Top Length: 0-2.00\nBottom Fabric: Cotton Cambric + Bottom Length: 0-2.00\nDupatta Fabric: Nazneen + Dupatta Length: 0-2.00\nLining Fabric: Cotton Cambric\nType: Un Stitched\nPattern: Printed\nMultipack: 3 Top\nCountry of Origin: India" "上衣面料:棉麻布 + 上衣长度:0-2.00\n下布面料:棉麻布 + 下装长度:0-2.00\nDupatta 面料:Nazneen + Dupatta 长度:0-2.00\n衬里面料:棉麻布\n类型:未缝合\ n图案:印刷\n多件装:3 顶\n原产国:印度"

I need to create python dictionary out of these two strings but with keys which are before colon我需要用这两个字符串创建 python 字典,但键在冒号之前

for example in string one keys would be例如,在字符串一键将是

TOP,BOTTOM,DUPATTA,TYPE,COLOUR,CONTAINS,COUNTRY OF ORIGIN顶部,底部,DUPATTA,类型,颜色,包含,原产国

and in second one第二个

keys would be键是

Top Fabric,Bottom Fabric,Top Length,Bottom Length,Dupatta Fabric,Dupatta Length,Lining Fabric,Type,Pattern,Multipack,Country of Origin顶布,底布,顶长,底长,杜帕塔面料,杜帕塔长度,里料面料,类型,图案,多件装,原产国

So far i have used到目前为止,我已经使用

keys = ["Top Fabric","Bottom Fabric","Dupatta Fabric","Lining Fabric","Type","Pattern","Multipack","TOP ","BOTTOM ","  DUPATTA ","COLOUR ","CONTAINS ","TYPE ","Country"] 

pattern = re.compile('({})\s+'.format(':|'.join(keys))) 
newdict = dict(zip(*[(i.strip() for i in (pattern.split(desc.replace("*",""))) if i)]*2))

but it is not working on first string and on second string it is not creatng every key and value但它不适用于第一个字符串和第二个字符串它没有创建每个键和值

You might use a regex pattern that matches the part before the colon in group 1 and after the colon in group 2.您可以使用匹配第 1 组中冒号之前和第 2 组中冒号之后的部分的正则表达式模式。

Then assert that after group 2, there is either another part starting with a + followed by : or the end of the string.然后断言在第 2 组之后,有另一部分以+开头,后跟:或字符串的结尾。

Then create a dictionary, stripping the group 1 and group 2 values.然后创建一个字典,剥离第 1 组和第 2 组的值。

(?:\s*\+\s*)?([^:]+)\s*:\s*([^:]+)(?=\+[^:+]*:|$)

The pattern matches:模式匹配:

  • (?:\s*\+\s*)? Optionally match a + sign between optional whitespace chars可选地匹配可选空白字符之间的+
  • ([^:]+) Capture group 1 , match any char except : ([^:]+)捕获组 1 ,匹配任何字符,除了:
  • \s*:\s* Match a : between optional whitespace chars \s*:\s*匹配 a :可选空白字符之间
  • ([^:]+) Capture group 2 , match any char except : ([^:]+)捕获组 2 ,匹配任何字符,除了:
  • (?=\+[^:+]*:|$) Positive lookahead, assert either + followed by : to the right, or assert the end of the string (?=\+[^:+]*:|$)正向前瞻,断言+后跟:向右,或断言字符串的结尾

Regex demo |正则表达式演示| Python demo Python 演示

Example例子

import re
import pprint

pattern = r"(?:\s*\+\s*)?([^:\r\n]+)\s*:\s*([^:\r\n]+)\s*(?=\+[^:+\n]*:|$)"

s = ("TOP : Cotton + Embroidered ( 2 Mtr) \n"
            "BOTTOM : Cotton + Solid (2 Mtr) \n"
            "DUPATTA : Chiffon + Lace Work ( 2 Mtr) \n"
            "TYPE : Un Stitched\n"
            "COLOUR : Multi Colour \n"
            "CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA\n"
            "Country of Origin: India\n\n"
            "Top Fabric: Cotton Cambric + Top Length: 0-2.00\n"
            "Bottom Fabric: Cotton Cambric + Bottom Length: 0-2.00\n"
            "Dupatta Fabric: Nazneen + Dupatta Length: 0-2.00\n"
            "Lining Fabric: Cotton Cambric\n"
            "Type: Un Stitched\n"
            "Pattern: Printed\n"
            "Multipack: 3 Top\n"
            "Country of Origin: India")

dictionary = {}
for m in re.finditer(pattern, s, re.MULTILINE):
    dictionary[m.group(1).strip()] = m.group(2).strip()
pprint.pprint(dictionary)

Output Output

{'BOTTOM': 'Cotton + Solid (2 Mtr)',
 'Bottom Fabric': 'Cotton Cambric',
 'Bottom Length': '0-2.00',
 'COLOUR': 'Multi Colour',
 'CONTAINS': '1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA',
 'Country of Origin': 'India',
 'DUPATTA': 'Chiffon + Lace Work ( 2 Mtr)',
 'Dupatta Fabric': 'Nazneen',
 'Dupatta Length': '0-2.00',
 'Lining Fabric': 'Cotton Cambric',
 'Multipack': '3 Top',
 'Pattern': 'Printed',
 'TOP': 'Cotton + Embroidered ( 2 Mtr)',
 'TYPE': 'Un Stitched',
 'Top Fabric': 'Cotton Cambric',
 'Top Length': '0-2.00',
 'Type': 'Un Stitched'}

You may try below dict comprehension, s1 represents one of your strings:您可以尝试以下 dict 理解, s1 代表您的字符串之一:

d={i.split(':')[0].strip(): i.split(':')[1].strip() for i in s1.split('\n')}

Edited: To make combining dict easier you can define a function:编辑:为了使组合字典更容易,您可以定义一个 function:

def f(s1):
    return {i.split(':')[0].strip(): i.split(':')[1].strip() for i in s1.split('\n')}
f('\n'.join([s1,s2])) # single dict from both strings
set(f(s1).keys()).intersection(f(s2).keys()) # common keys 

{'Country of Origin'} key common key in both sets, but it eeuals India {'Country of Origin'} 两个集合中的关键公共密钥,但它与印度相同

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM