简体   繁体   中英

format multiline string with python

I need to format the below shown multiple line string in python. I've tried many ways but they don't end up well.

AMAZON
IPHONE: 700
SAMSUNG: 600

=============

WALMART
IPHONE: 699

===========

ALIBABA
SONY: 500

So, the above data represent the online store and it's price of a mobile with its brand. I need to add these to a database. So, it should be like this

-------------------
AMAZON | IPHONE | 700
-------------------
AMAZON | SAMSUNG | 600
-------------------
WALMART | IPHONE | 699
-------------------
ALIBABA | SONY | 500
-------------------

I need to format the above text and store it in a database table.

What I have tried? I tried to split the multiple lines and create a dictionary more likely to be JSON. But It doesn't end well. But it takes only one line. If there is some other easy approach share me. Please help me with this!

I made some assumptions:

  • the vendor name is always before the products
  • at least === as separator between vendor entries
  • empty lines can be ignored

Working code:

str = """
AMAZON
IPHONE: 700
SAMSUNG: 600

=============

WALMART
IPHONE: 699

===========

ALIBABA
SONY: 500
"""

new_entry = True
print("-------------------")
for line in str.split("\n"):
    # assuming first entry is always the vendor name
    if not line.strip():
        continue
    elif new_entry:
        vendor = line.strip()
        new_entry = False
    elif "===" in line:
        new_entry = True
    else:
        product = line.split(":")
        print("{} | {} | {}".format(vendor, product[0].strip(), product[1].strip()))
        print("-------------------")

Output is:

-------------------
AMAZON | IPHONE | 700
-------------------
AMAZON | SAMSUNG | 600
-------------------
WALMART | IPHONE | 699
-------------------
ALIBABA | SONY | 500
-------------------

Alternative approach: The vendor name could also be found as being a text line, but without colon.

answer submitted by @scito is adequate enough, but i am putting mine just in case. you can use regex, following is a working example:

strng = """
AMAZON
IPHONE: 700
SAMSUNG: 600

=============

WALMART
IPHONE: 699

===========

ALIBABA
SONY: 500

======
"""

multistrng = strng.split("\n") # get each line seperated by \n

import re 

market_re = re.compile('([a-zA-Z]+)') # regex to find market name

phone_re = re.compile(r"([a-zA-Z]+):\s(\d+)") # regex to find phone and its price

js = [] # list to hold all data found

for line in multistrng:
    phone = phone_re.findall(line) # if line contains phone and its price
    if phone:
        js[-1].append(phone[0]) # add phone to recently found marketplace
        continue
    market = market_re.findall(line)
    if market: # if line contains market place name
        js.append([market[0]])
        continue
    else:
        continue # empty lines ignore

# now you have the data in structured manner, you can print or add it to the database

for market in js:
    for product in market[1:]:
        print("---------------------")
        print("{} | {} | {}".format(market[0], product[0], product[1]))

print("---------------------")

output:

---------------------
AMAZON | IPHONE | 700
---------------------
AMAZON | SAMSUNG | 600
---------------------
WALMART | IPHONE | 699
---------------------
ALIBABA | SONY | 500
---------------------

data is stored in js list, if you iterate over js, first element in sub-list is market place, and rest is products for that market place.

[['AMAZON', ('IPHONE', '700'), ('SAMSUNG', '600')], ['WALMART', ('IPHONE', '699')], ['ALIBABA', ('SONY', '500')]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM