简体   繁体   English

Pandas - 根据 str 包含从另一列创建带有值的新列

[英]Pandas - Create new column w/values from another column based on str contains

I have two DataFrames.我有两个数据帧。 One with multiple columns and other with just one.一列有多列,另一列只有一列。 So what I need is to join based on partial str of a column.所以我需要的是基于列的部分 str 加入。 Example:例子:

df1 df1

| Name     |       Classification       |
| -------- | -------------------------- |
| A        | Transport/Bicycle/Mountain |
| B        | Transport/City/Bus         |
| C        | Transport/Taxi/City        |
| D        | Transport/City/Uber        |
| E        | Transport/Mountain/Jeep    |

df2 df2



| Category |
| -------- | 
| Mountain |
| City     | 

As you can see the order on Classification column is not well difined.正如您所看到的,分类列上的顺序没有很好地定义。

Derisable Output可笑的输出

| Name     |       Classification       | Category  |
| -------- | -------------------------- |-----------|
| A        | Transport/Bicycle/Mountain | Mountain  |
| B        | Transport/City/Bus         | City      |
| C        | Transport/Taxi/City        | City      |
| D        | Transport/City/Uber        | City      |
| E        | Transport/Mountain/Jeep    | Mountain  |

I'm stuck on this.我被困在这一点上。 Any ideas?有任何想法吗?

Many thanks in advance.提前谢谢了。

This implementation does the trick:这个实现可以解决问题:

def get_cat(val):
    for cat in df2['Category']:
        if cat in val:
            return cat
    return None

df['Category'] = df['Classification'].apply(get_cat)

Note: as @Justin Ezequiel pointed out in the comments, you haven't specified what to do when Mountain and City exists in the Classification.注意:正如@Justin Ezequiel 在评论中指出的那样,当分类中存在 Mountain 和 City 时,您没有指定要做什么。 Current implementation uses the first Category that matches.当前实现使用匹配的第一个类别。

You can try this:你可以试试这个:

dff={"ne":[]}

for x in df1["Classification"]:
    if a in df2 and a in x:
        dff["ne"].append(a)
df1["Category"]=dff["ne"]

df1 will look like your desirable output. df1看起来像您想要的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一列中的“NaN”值在 Pandas Dataframe 中创建一个新列 - Create a new column in Pandas Dataframe based on the 'NaN' values in another column 根据熊猫中另一列中相似值的分组来创建新列 - Create a new column based on Grouping of similar values in another column in pandas 如何基于另一列的值在pandas dataframe列中创建新值 - How to create new values in a pandas dataframe column based on values from another column 根据 pandas 中字典中另一列的值添加新列 - Add new column based on values of another column from a dictionary in pandas 基于来自另一列的空值在 Pandas 中获取一个新列 - Getting a new column in pandas based on null values from another column 根据另一列的值在 Pandas 中创建新列 - Creating new column in Pandas based on values from another column 根据 Pandas 中其他列的一些值创建一个新列 - Create a New Column Based on Some Values From Other Column in Pandas str.contains在pandas数据帧中创建新列 - str.contains to create new column in pandas dataframe 使用 str.contains 创建新列 Pandas df 给出:值的长度与索引的长度不匹配 - create new column Pandas df with str.contains gives: Length of values does not match length of index 基于另一个规则的新列(在值中包含一个数字) - New column based on rule from another (contains a number in the values)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM