简体   繁体   English

列存储为列表; 如何在 pandas python 中拆分为 COLUMNS?

[英]Column stored as List; how can I split as COLUMNS in pandas python?

assume "Tags" column as stores as below;假定“标签”列为商店,如下所示; How can I split into multiple columns or set into one list?如何拆分为多列或设置为一个列表?

desired as " To be combined as List and filter-out duplication desired as " 合并为 List 并过滤掉重复项

"Tags"
['Saudi', 'law', 'Saudi Arabia', 'rules']
['Hindi', 'Tamil', 'imposition', 'cbse', 'neet', 'Tamil Nadu', 'India']
['Stephen', 'Hawkins', 'Tamil', 'predictions', 'future', 'science', 'scientist', 'top 5', 'five']
['Bigg Boss', 'Tamil', 'Kamal', 'big', 'boss']
['Mary', 'real', 'story', 'Tamil', 'history']
['football', 'Tamil', 'FIFA', '2018', 'world cup', 'MG', 'top', '10', 'ten']
['India', 'Tamil', 'poor', 'rich', 'money', 'MG', 'why', 'Indians']

Try:尝试:

df["Tags"].explode().unique()

Or:或者:

np.unique(df["Tags"].sum())

Edit:编辑:

Maybe you need:也许你需要:

import ast
df["Tags"].apply(ast.literal_eval).explode().unique()

If need list without duplicates use set comprehension with set if performance is important:如果需要没有重复的列表,如果性能很重要,请使用集合理解和set

L = list(set(y for x in df['Tags'] for y in x))

If possible there are list s saved like strings use:如果可能的话,像字符串一样保存list使用:

import ast

L = list(set(y for x in df['Tags'] for y in ast.literal_eval(x)))

print (L)
['FIFA', 'Mary', 'world cup', 'rich', 'story', 'Tamil', 'rules', 'neet', 'money', 'Kamal', 'Hindi', 'big', 'cbse', 'imposition', 'football', 'MG', 'history', 'predictions', 'why', 'Tamil Nadu', 'top 5', 'ten', '10', 'Bigg Boss', 'India', 'Stephen', 'top', 'poor', 'law', 'Saudi', 'real', 'Indians', 'future', 'boss', 'five', '2018', 'scientist', 'Saudi Arabia', 'science', 'Hawkins']

You could flatten the lists and use set() :您可以展平列表并使用set()

out = []
for lst in df['Tags'].tolist():
    out.extend(lst)

out = list(set(out))

Output: Output:

['cbse', '2018', 'future', 'India', '10', 'Indians', 'money', 
'Hindi', 'rules', 'poor', 'Kamal', 'neet', 'top 5', 'world cup', 
'five', 'law', 'ten', 'Stephen', 'Tamil', 'Mary', 'Bigg Boss', 
'top', 'scientist', 'boss', 'Saudi Arabia', 'big', 'real', 'story', 
'why', 'Hawkins', 'predictions', 'football', 'rich', 'science', 
'imposition', 'Saudi', 'FIFA', 'history', 'Tamil Nadu', 'MG']

Using the same code, for the lists below:对于以下列表,使用相同的代码:

lsts = [['thamizh', 'kannada', 'karnataka', 'bangalore', 'mysore', 
'bengaluru', 'Bengaluru', 'malayalam', 'kerala', 'chennai', 'yash',
 'kgf', 'songs', 'kannada songs', 'news', 'today'], 
 ['songs', 'kannada songs', 'news', 'today'], 
['mysore', 'bengaluru', 'Bengaluru', 'malayalam',]]

Output: Output:

['today', 'songs', 'malayalam', 'bangalore', 'karnataka', 'kerala', 
'bengaluru', 'mysore', 'kgf', 'Bengaluru', 'chennai', 'yash', 
'thamizh', 'kannada', 'news', 'kannada songs']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM