簡體   English   中英

列存儲為列表; 如何在 pandas python 中拆分為 COLUMNS?

[英]Column stored as List; how can I split as COLUMNS in pandas python?

假定“標簽”列為商店,如下所示; 如何拆分為多列或設置為一個列表?

desired as " 合並為 List 並過濾掉重復項

"Tags"
['Saudi', 'law', 'Saudi Arabia', 'rules']
['Hindi', 'Tamil', 'imposition', 'cbse', 'neet', 'Tamil Nadu', 'India']
['Stephen', 'Hawkins', 'Tamil', 'predictions', 'future', 'science', 'scientist', 'top 5', 'five']
['Bigg Boss', 'Tamil', 'Kamal', 'big', 'boss']
['Mary', 'real', 'story', 'Tamil', 'history']
['football', 'Tamil', 'FIFA', '2018', 'world cup', 'MG', 'top', '10', 'ten']
['India', 'Tamil', 'poor', 'rich', 'money', 'MG', 'why', 'Indians']

嘗試:

df["Tags"].explode().unique()

或者:

np.unique(df["Tags"].sum())

編輯:

也許你需要:

import ast
df["Tags"].apply(ast.literal_eval).explode().unique()

如果需要沒有重復的列表,如果性能很重要,請使用集合理解和set

L = list(set(y for x in df['Tags'] for y in x))

如果可能的話,像字符串一樣保存list使用:

import ast

L = list(set(y for x in df['Tags'] for y in ast.literal_eval(x)))

print (L)
['FIFA', 'Mary', 'world cup', 'rich', 'story', 'Tamil', 'rules', 'neet', 'money', 'Kamal', 'Hindi', 'big', 'cbse', 'imposition', 'football', 'MG', 'history', 'predictions', 'why', 'Tamil Nadu', 'top 5', 'ten', '10', 'Bigg Boss', 'India', 'Stephen', 'top', 'poor', 'law', 'Saudi', 'real', 'Indians', 'future', 'boss', 'five', '2018', 'scientist', 'Saudi Arabia', 'science', 'Hawkins']

您可以展平列表並使用set()

out = []
for lst in df['Tags'].tolist():
    out.extend(lst)

out = list(set(out))

Output:

['cbse', '2018', 'future', 'India', '10', 'Indians', 'money', 
'Hindi', 'rules', 'poor', 'Kamal', 'neet', 'top 5', 'world cup', 
'five', 'law', 'ten', 'Stephen', 'Tamil', 'Mary', 'Bigg Boss', 
'top', 'scientist', 'boss', 'Saudi Arabia', 'big', 'real', 'story', 
'why', 'Hawkins', 'predictions', 'football', 'rich', 'science', 
'imposition', 'Saudi', 'FIFA', 'history', 'Tamil Nadu', 'MG']

對於以下列表,使用相同的代碼:

lsts = [['thamizh', 'kannada', 'karnataka', 'bangalore', 'mysore', 
'bengaluru', 'Bengaluru', 'malayalam', 'kerala', 'chennai', 'yash',
 'kgf', 'songs', 'kannada songs', 'news', 'today'], 
 ['songs', 'kannada songs', 'news', 'today'], 
['mysore', 'bengaluru', 'Bengaluru', 'malayalam',]]

Output:

['today', 'songs', 'malayalam', 'bangalore', 'karnataka', 'kerala', 
'bengaluru', 'mysore', 'kgf', 'Bengaluru', 'chennai', 'yash', 
'thamizh', 'kannada', 'news', 'kannada songs']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM