[英]Python Pandas merge 3 columns of lists in to a single column
I have 3 columns with keywords that have been derived through different algorithms.我有 3 列关键字,这些关键字是通过不同的算法得出的。
the data is something like this数据是这样的
product desc keywords1 keywords2 keywords3产品描述关键字1 关键字2 关键字3
productX, "blah blah", [iot, inte.net, cloud], [cloud, inte.net, energy management], [inte.net of things, cloud, inte.net] productX, "blah blah", [iot, inte.net, cloud], [cloud, inte.net, energy management], [inte.net of things, cloud, inte.net]
How do I merge the 3 keyword column in to a single one and also remove any duplicates, for example the keywords "cloud" should only be stored once?如何将 3 个关键字列合并为一个列并删除所有重复项,例如关键字“cloud”应该只存储一次?
use set()
使用
set()
import pandas as pd
df = pd.DataFrame({'c1':[['a', 'c']], 'c2':[['a', 'd']]})
df['c3'] = (df['c1'] + df['c2']).apply(set).apply(list)
df
c1 c2 c3
0 [a, c] [a, d] [d, a, c]
You could apply a function to the data frame that does set intersection across the three columns.您可以将 function 应用于在三列之间设置交集的数据框。
df['updatedKeywords'] = df.apply(lambda row: set(row['keyword1']) & set(row['keyword2'] & set(row['keyword3']), axis=1)
If you had a lot of columns to intersect you could extend it:如果你有很多列相交,你可以扩展它:
columnsToIntersect = ['keyword' + str(i) for i in range(numberOfKeywordColumns)]
df['updatedKeywords'] = df.apply(lambda row: set.intersection(*[set(row[x]) for x in columnsToIntersect], axis=1)
Finally, you could also use pandas.DataFrame.aggregate , though it may be overkill for this sort of task.最后,您还可以使用pandas.DataFrame.aggregate ,尽管对于此类任务来说可能有些矫枉过正。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.