Python Pandas 将 3 列列表合并为一列

Question

I have 3 columns with keywords that have been derived through different algorithms.我有 3 列关键字，这些关键字是通过不同的算法得出的。

the data is something like this数据是这样的

product desc keywords1 keywords2 keywords3产品描述关键字1 关键字2 关键字3

productX, "blah blah", [iot, inte.net, cloud], [cloud, inte.net, energy management], [inte.net of things, cloud, inte.net] productX, "blah blah", [iot, inte.net, cloud], [cloud, inte.net, energy management], [inte.net of things, cloud, inte.net]

How do I merge the 3 keyword column in to a single one and also remove any duplicates, for example the keywords "cloud" should only be stored once?如何将 3 个关键字列合并为一个列并删除所有重复项，例如关键字“cloud”应该只存储一次？

Answer 1

use set()使用set()

import pandas as pd

df = pd.DataFrame({'c1':[['a', 'c']], 'c2':[['a', 'd']]})
df['c3'] = (df['c1'] + df['c2']).apply(set).apply(list)

df

    c1      c2      c3
0   [a, c]  [a, d]  [d, a, c]

Answer 2

You could apply a function to the data frame that does set intersection across the three columns.您可以将 function 应用于在三列之间设置交集的数据框。

df['updatedKeywords'] = df.apply(lambda row: set(row['keyword1']) & set(row['keyword2'] & set(row['keyword3']), axis=1)

If you had a lot of columns to intersect you could extend it:如果你有很多列相交，你可以扩展它：

columnsToIntersect = ['keyword' + str(i) for i in range(numberOfKeywordColumns)]
df['updatedKeywords'] = df.apply(lambda row: set.intersection(*[set(row[x]) for x in columnsToIntersect], axis=1)

Finally, you could also use pandas.DataFrame.aggregate , though it may be overkill for this sort of task.最后，您还可以使用pandas.DataFrame.aggregate ，尽管对于此类任务来说可能有些矫枉过正。

Python Pandas 将 3 列列表合并为一列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-01-13 22:09:54

解决方案2
0 2021-01-13 22:16:38

Python Pandas 将 3 列列表合并为一列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-01-13 22:09:54

解决方案2 0 2021-01-13 22:16:38

解决方案1
2 已采纳 2021-01-13 22:09:54

解决方案2
0 2021-01-13 22:16:38