简体   繁体   中英

How do I combine lists in column of dataframe to a single list

Some context, I have some data that I'm doing some text analysis on, I have just tokenized them and I want to combine all the lists in the dataframe column for some further processing.

My df is as:

df = pd.DataFrame({'title': ['issue regarding app', 'graphics should be better'], 'text': [["'app'", "'load'", "'slowly'"], ["'interface'", "'need'", "'to'", "'look'", "'nicer'"]]})`

I want to merge all the lists in the 'text' column into one list, and also remove the open/close inverted commas.

Something like this:

lst = ['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']`

Thank you for all your help!

You can accomplish that with the use of apply and lambda

The use of apply method is to apply a function to each element in the 'text' column while the sum function is to concatenate all the lists together

lst = sum(df["text"].apply(lambda x: [i.replace("'", "") for i in x]), [])

Output:

['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']

If you want to replace multiple elements like "'“ and "a" , translate will be efficient instead of replace :

trans = str.maketrans("", "", "'a")
lst = sum(df["text"].apply(lambda x: [i.translate(trans) for i in x]), [])

Use a simple list comprehension:

out = [x.strip("'") for l in df['text'] for x in l]

Output:

['app', 'load', 'slowly', 'interface', 'need', 'to', 'look', 'nicer']

We can also iterate through each list in the series and concatenate them using append() and finally use concat() to convert them to a list. Yields the same output as above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM