简体   繁体   中英

how to properly split string to create dictionary in python?

I have two Strings

" TOP : Cotton + Embroidered ( 2 Mtr) \n BOTTOM : Cotton + Solid (2 Mtr) \n DUPATTA : Chiffon + Lace Work ( 2 Mtr) \n TYPE : Un Stitched\n COLOUR : Multi Colour \n CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA\nCountry of Origin: India"

and Second one is

"Top Fabric: Cotton Cambric + Top Length: 0-2.00\nBottom Fabric: Cotton Cambric + Bottom Length: 0-2.00\nDupatta Fabric: Nazneen + Dupatta Length: 0-2.00\nLining Fabric: Cotton Cambric\nType: Un Stitched\nPattern: Printed\nMultipack: 3 Top\nCountry of Origin: India"

I need to create python dictionary out of these two strings but with keys which are before colon

for example in string one keys would be

TOP,BOTTOM,DUPATTA,TYPE,COLOUR,CONTAINS,COUNTRY OF ORIGIN

and in second one

keys would be

Top Fabric,Bottom Fabric,Top Length,Bottom Length,Dupatta Fabric,Dupatta Length,Lining Fabric,Type,Pattern,Multipack,Country of Origin

So far i have used

keys = ["Top Fabric","Bottom Fabric","Dupatta Fabric","Lining Fabric","Type","Pattern","Multipack","TOP ","BOTTOM ","  DUPATTA ","COLOUR ","CONTAINS ","TYPE ","Country"] 

pattern = re.compile('({})\s+'.format(':|'.join(keys))) 
newdict = dict(zip(*[(i.strip() for i in (pattern.split(desc.replace("*",""))) if i)]*2))

but it is not working on first string and on second string it is not creatng every key and value

You might use a regex pattern that matches the part before the colon in group 1 and after the colon in group 2.

Then assert that after group 2, there is either another part starting with a + followed by : or the end of the string.

Then create a dictionary, stripping the group 1 and group 2 values.

(?:\s*\+\s*)?([^:]+)\s*:\s*([^:]+)(?=\+[^:+]*:|$)

The pattern matches:

  • (?:\s*\+\s*)? Optionally match a + sign between optional whitespace chars
  • ([^:]+) Capture group 1 , match any char except :
  • \s*:\s* Match a : between optional whitespace chars
  • ([^:]+) Capture group 2 , match any char except :
  • (?=\+[^:+]*:|$) Positive lookahead, assert either + followed by : to the right, or assert the end of the string

Regex demo | Python demo

Example

import re
import pprint

pattern = r"(?:\s*\+\s*)?([^:\r\n]+)\s*:\s*([^:\r\n]+)\s*(?=\+[^:+\n]*:|$)"

s = ("TOP : Cotton + Embroidered ( 2 Mtr) \n"
            "BOTTOM : Cotton + Solid (2 Mtr) \n"
            "DUPATTA : Chiffon + Lace Work ( 2 Mtr) \n"
            "TYPE : Un Stitched\n"
            "COLOUR : Multi Colour \n"
            "CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA\n"
            "Country of Origin: India\n\n"
            "Top Fabric: Cotton Cambric + Top Length: 0-2.00\n"
            "Bottom Fabric: Cotton Cambric + Bottom Length: 0-2.00\n"
            "Dupatta Fabric: Nazneen + Dupatta Length: 0-2.00\n"
            "Lining Fabric: Cotton Cambric\n"
            "Type: Un Stitched\n"
            "Pattern: Printed\n"
            "Multipack: 3 Top\n"
            "Country of Origin: India")

dictionary = {}
for m in re.finditer(pattern, s, re.MULTILINE):
    dictionary[m.group(1).strip()] = m.group(2).strip()
pprint.pprint(dictionary)

Output

{'BOTTOM': 'Cotton + Solid (2 Mtr)',
 'Bottom Fabric': 'Cotton Cambric',
 'Bottom Length': '0-2.00',
 'COLOUR': 'Multi Colour',
 'CONTAINS': '1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA',
 'Country of Origin': 'India',
 'DUPATTA': 'Chiffon + Lace Work ( 2 Mtr)',
 'Dupatta Fabric': 'Nazneen',
 'Dupatta Length': '0-2.00',
 'Lining Fabric': 'Cotton Cambric',
 'Multipack': '3 Top',
 'Pattern': 'Printed',
 'TOP': 'Cotton + Embroidered ( 2 Mtr)',
 'TYPE': 'Un Stitched',
 'Top Fabric': 'Cotton Cambric',
 'Top Length': '0-2.00',
 'Type': 'Un Stitched'}

You may try below dict comprehension, s1 represents one of your strings:

d={i.split(':')[0].strip(): i.split(':')[1].strip() for i in s1.split('\n')}

Edited: To make combining dict easier you can define a function:

def f(s1):
    return {i.split(':')[0].strip(): i.split(':')[1].strip() for i in s1.split('\n')}
f('\n'.join([s1,s2])) # single dict from both strings
set(f(s1).keys()).intersection(f(s2).keys()) # common keys 

{'Country of Origin'} key common key in both sets, but it eeuals India

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM