简体   繁体   中英

Is there a function that removes delimiters from list composed of strings and other lists?

I have a row that looks like this:

Alain,David,43,"['Cinema:ABC', 'Cafe:Evasion', 'Hotel:Hotel Du Parc', 'Cafe:Casa del gelato']","['Notebook', 'Cigarette électronique', 'Livre:Roman']","['Matin:8h-10h', 'Apres-midi:12h-15h']","['Politique']

I have tried to remove delimiters ([,],"",'') to obtain something like this in order to calculate similarity between rows later:

Alain,David,43,Cinema:ABC, Cafe:Evasion, Hotel:Hotel Du Parc, Cafe:Casa del gelato,Notebook, Cigarette électronique, Livre:Roman,Matin:8h-10h, Apres-midi:12h-15h,Politique

But it failed! Any idea?

I assume you have list, not string

 row = ['Alain','David',43,"['Cinema:ABC', 'Cafe:Evasion', 'Hotel:Hotel Du Parc', 'Cafe:Casa del gelato']","['Notebook', 'Cigarette électronique', 'Livre:Roman']","['Matin:8h-10h', 'Apres-midi:12h-15h']","['Politique']"]

You have string with list in some columns. You have to convert back string to list. You can use eval() to convert string to Python's list.

result = []

for item in row:
    if isinstance(item, str) and item.startswith('['):
        result += eval(item)
    else:
        result.append(item)

print(result)    

EDIT:

You generate it with

file.writerow([
   random.choice(Prenoms),
   random.choice(Noms),
   random.randint(17,65),
   random.sample(Lfreq,4)
])

But random.sample(Lfreq,4) gives list which you have to write as separated columns.

data = random.sample(Lfreq,4)

file.writerow([
    random.choice(Prenoms), 
    random.choice(Noms),
    random.randint(17,65), 
    data[0], 
    data[1], 
    data[2], 
    data[3]
])

or extend list using extend or +=

data = [random.choice(Prenoms), random.choice(Noms), random.randint(17,65)]

#data.extend(random.sample(Lfreq,4))
data += random.sample(Lfreq,4)

file.writerow(data)

There is a function that solves this.

# -*- coding: utf-8 -*-

import re

def plain_array_from_array_with_subarrays_as_strings(array):
    response = []
    for el in array:
        if not isinstance(el, (int, float)):
            sub_els = re.findall(r"'([^']+)'", el)
            if len(sub_els) > 0:
                for sub_el in sub_els:
                    response.append(sub_el)
            else:
                response.append(el)
        else:
            response.append(el)
    return response

r = [
    "Alain",
    "David",
    43,
    "['Cinema:ABC', 'Cafe:Evasion', 'Hotel:Hotel Du Parc', 'Cafe:Casa del gelato']",
    "['Notebook', 'Cigarette électronique', 'Livre:Roman']",
    "['Matin:8h-10h', 'Apres-midi:12h-15h']",
    "['Politique']"
]    
print(plain_array_from_array_with_subarrays_as_strings(r))

Output:

['Alain',
 'David',
 43,
 'Cinema:ABC',
 'Cafe:Evasion',
 'Hotel:Hotel Du Parc',
 'Cafe:Casa del gelato',
 'Notebook',
 'Cigarette électronique',
 'Livre:Roman',
 'Matin:8h-10h',
 'Apres-midi:12h-15h',
 'Politique']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM