简体   繁体   中英

Column stored as List; how can I split as COLUMNS in pandas python?

assume "Tags" column as stores as below; How can I split into multiple columns or set into one list?

desired as " To be combined as List and filter-out duplication

"Tags"
['Saudi', 'law', 'Saudi Arabia', 'rules']
['Hindi', 'Tamil', 'imposition', 'cbse', 'neet', 'Tamil Nadu', 'India']
['Stephen', 'Hawkins', 'Tamil', 'predictions', 'future', 'science', 'scientist', 'top 5', 'five']
['Bigg Boss', 'Tamil', 'Kamal', 'big', 'boss']
['Mary', 'real', 'story', 'Tamil', 'history']
['football', 'Tamil', 'FIFA', '2018', 'world cup', 'MG', 'top', '10', 'ten']
['India', 'Tamil', 'poor', 'rich', 'money', 'MG', 'why', 'Indians']

Try:

df["Tags"].explode().unique()

Or:

np.unique(df["Tags"].sum())

Edit:

Maybe you need:

import ast
df["Tags"].apply(ast.literal_eval).explode().unique()

If need list without duplicates use set comprehension with set if performance is important:

L = list(set(y for x in df['Tags'] for y in x))

If possible there are list s saved like strings use:

import ast

L = list(set(y for x in df['Tags'] for y in ast.literal_eval(x)))

print (L)
['FIFA', 'Mary', 'world cup', 'rich', 'story', 'Tamil', 'rules', 'neet', 'money', 'Kamal', 'Hindi', 'big', 'cbse', 'imposition', 'football', 'MG', 'history', 'predictions', 'why', 'Tamil Nadu', 'top 5', 'ten', '10', 'Bigg Boss', 'India', 'Stephen', 'top', 'poor', 'law', 'Saudi', 'real', 'Indians', 'future', 'boss', 'five', '2018', 'scientist', 'Saudi Arabia', 'science', 'Hawkins']

You could flatten the lists and use set() :

out = []
for lst in df['Tags'].tolist():
    out.extend(lst)

out = list(set(out))

Output:

['cbse', '2018', 'future', 'India', '10', 'Indians', 'money', 
'Hindi', 'rules', 'poor', 'Kamal', 'neet', 'top 5', 'world cup', 
'five', 'law', 'ten', 'Stephen', 'Tamil', 'Mary', 'Bigg Boss', 
'top', 'scientist', 'boss', 'Saudi Arabia', 'big', 'real', 'story', 
'why', 'Hawkins', 'predictions', 'football', 'rich', 'science', 
'imposition', 'Saudi', 'FIFA', 'history', 'Tamil Nadu', 'MG']

Using the same code, for the lists below:

lsts = [['thamizh', 'kannada', 'karnataka', 'bangalore', 'mysore', 
'bengaluru', 'Bengaluru', 'malayalam', 'kerala', 'chennai', 'yash',
 'kgf', 'songs', 'kannada songs', 'news', 'today'], 
 ['songs', 'kannada songs', 'news', 'today'], 
['mysore', 'bengaluru', 'Bengaluru', 'malayalam',]]

Output:

['today', 'songs', 'malayalam', 'bangalore', 'karnataka', 'kerala', 
'bengaluru', 'mysore', 'kgf', 'Bengaluru', 'chennai', 'yash', 
'thamizh', 'kannada', 'news', 'kannada songs']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM