Pandas - 根据 str 包含从另一列创建带有值的新列

Question

I have two DataFrames.我有两个数据帧。 One with multiple columns and other with just one.一列有多列，另一列只有一列。 So what I need is to join based on partial str of a column.所以我需要的是基于列的部分 str 加入。 Example:例子：

df1 df1

| Name     |       Classification       |
| -------- | -------------------------- |
| A        | Transport/Bicycle/Mountain |
| B        | Transport/City/Bus         |
| C        | Transport/Taxi/City        |
| D        | Transport/City/Uber        |
| E        | Transport/Mountain/Jeep    |

df2 df2



| Category |
| -------- | 
| Mountain |
| City     |

As you can see the order on Classification column is not well difined.正如您所看到的，分类列上的顺序没有很好地定义。

Derisable Output可笑的输出

| Name     |       Classification       | Category  |
| -------- | -------------------------- |-----------|
| A        | Transport/Bicycle/Mountain | Mountain  |
| B        | Transport/City/Bus         | City      |
| C        | Transport/Taxi/City        | City      |
| D        | Transport/City/Uber        | City      |
| E        | Transport/Mountain/Jeep    | Mountain  |

I'm stuck on this.我被困在这一点上。 Any ideas?有任何想法吗？

Many thanks in advance.提前谢谢了。

Answer 1

This implementation does the trick:这个实现可以解决问题：

def get_cat(val):
    for cat in df2['Category']:
        if cat in val:
            return cat
    return None

df['Category'] = df['Classification'].apply(get_cat)

Note: as @Justin Ezequiel pointed out in the comments, you haven't specified what to do when Mountain and City exists in the Classification.注意：正如@Justin Ezequiel 在评论中指出的那样，当分类中存在 Mountain 和 City 时，您没有指定要做什么。 Current implementation uses the first Category that matches.当前实现使用匹配的第一个类别。

Answer 2

You can try this:你可以试试这个：

dff={"ne":[]}

for x in df1["Classification"]:
    if a in df2 and a in x:
        dff["ne"].append(a)
df1["Category"]=dff["ne"]

df1 will look like your desirable output. df1看起来像您想要的输出。

Pandas - 根据 str 包含从另一列创建带有值的新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-07-16 17:19:59

解决方案2
1 2021-07-16 17:19:39

Pandas - 根据 str 包含从另一列创建带有值的新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-07-16 17:19:59

解决方案2 1 2021-07-16 17:19:39

解决方案1
2 已采纳 2021-07-16 17:19:59

解决方案2
1 2021-07-16 17:19:39