how to split a column in dataframe into list of tuple

Question

I found some answers online, but I have no experience with regular expressions, which I believe is what is needed here and if there is another way it would be better.

I have a complexed column in my dataframe that needs to be split by either a ',' ';''(' ')' ':'

Example string:

(36%) (litopenaaus varmrn ), une chapelure (25%) [vmaaî fmur, water,) sel, soja 0i), sucre, levure), eau. î farine de whca, amidon de mais, sart, cre. regulators (450, 500, stg). soybean [containing an antioxidant (300)]. sucre, powder of gariic, levure, th ci nœ (412). contient des crevettes"

should be split into a list containing the following

["36%", "litopenaaus varmrn", "une chapelure (25%)", ["vmaaî fmur", "water", "sel", "soja 0i", "sucre", "levure"], "eau. î farine de whca", "amidon de mais", "sart", "cre. regulators ["(450, 500, stg)"]. soybean [containing an antioxidant (300)]. sucre", "powder of gariic", "levure"," th ci nœ (412). contient des crevettes"]

The code I have written to do this looks like this but nothing happend:

delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))

df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))

Answer 1

By doing

delimiters = ",", ":", "(", ")", ";"
regexPattern = '|'.join(map(re.escape, delimiters))

df['splited'] = df.ingredient.apply(lambda row: ' '.join((re.split(regexPattern, str(row)))))

you actually splitted ( re.split ) and then joined created parts using space character ( ' '.join ), if you need parts list rather than single new string simply do not join them, ie

df['splited'] = df.ingredient.apply(lambda row: re.split(regexPattern, str(row)))

how to split a column in dataframe into list of tuple

Question

1 answers

solution1
1 ACCPTED 2021-09-28 14:31:42

how to split a column in dataframe into list of tuple

Question

1 answers

solution1 1 ACCPTED 2021-09-28 14:31:42

solution1
1 ACCPTED 2021-09-28 14:31:42