简体   繁体   English

Python Pandas 将 3 列列表合并为一列

[英]Python Pandas merge 3 columns of lists in to a single column

I have 3 columns with keywords that have been derived through different algorithms.我有 3 列关键字,这些关键字是通过不同的算法得出的。

the data is something like this数据是这样的

product desc keywords1 keywords2 keywords3产品描述关键字1 关键字2 关键字3

productX, "blah blah", [iot, inte.net, cloud], [cloud, inte.net, energy management], [inte.net of things, cloud, inte.net] productX, "blah blah", [iot, inte.net, cloud], [cloud, inte.net, energy management], [inte.net of things, cloud, inte.net]

How do I merge the 3 keyword column in to a single one and also remove any duplicates, for example the keywords "cloud" should only be stored once?如何将 3 个关键字列合并为一个列并删除所有重复项,例如关键字“cloud”应该只存储一次?

use set()使用set()

import pandas as pd

df = pd.DataFrame({'c1':[['a', 'c']], 'c2':[['a', 'd']]})
df['c3'] = (df['c1'] + df['c2']).apply(set).apply(list)

df
    c1      c2      c3
0   [a, c]  [a, d]  [d, a, c]

You could apply a function to the data frame that does set intersection across the three columns.您可以将 function 应用于在三列之间设置交集的数据框。

df['updatedKeywords'] = df.apply(lambda row: set(row['keyword1']) & set(row['keyword2'] & set(row['keyword3']), axis=1)

If you had a lot of columns to intersect you could extend it:如果你有很多列相交,你可以扩展它:

columnsToIntersect = ['keyword' + str(i) for i in range(numberOfKeywordColumns)]
df['updatedKeywords'] = df.apply(lambda row: set.intersection(*[set(row[x]) for x in columnsToIntersect], axis=1)

Finally, you could also use pandas.DataFrame.aggregate , though it may be overkill for this sort of task.最后,您还可以使用pandas.DataFrame.aggregate ,尽管对于此类任务来说可能有些矫枉过正。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM