[英]Pandas - Create new column w/values from another column based on str contains
I have two DataFrames.我有两个数据帧。 One with multiple columns and other with just one.一列有多列,另一列只有一列。 So what I need is to join based on partial str of a column.所以我需要的是基于列的部分 str 加入。 Example:例子:
df1 df1
| Name | Classification |
| -------- | -------------------------- |
| A | Transport/Bicycle/Mountain |
| B | Transport/City/Bus |
| C | Transport/Taxi/City |
| D | Transport/City/Uber |
| E | Transport/Mountain/Jeep |
df2 df2
| Category |
| -------- |
| Mountain |
| City |
As you can see the order on Classification column is not well difined.正如您所看到的,分类列上的顺序没有很好地定义。
Derisable Output可笑的输出
| Name | Classification | Category |
| -------- | -------------------------- |-----------|
| A | Transport/Bicycle/Mountain | Mountain |
| B | Transport/City/Bus | City |
| C | Transport/Taxi/City | City |
| D | Transport/City/Uber | City |
| E | Transport/Mountain/Jeep | Mountain |
I'm stuck on this.我被困在这一点上。 Any ideas?有任何想法吗?
Many thanks in advance.提前谢谢了。
This implementation does the trick:这个实现可以解决问题:
def get_cat(val):
for cat in df2['Category']:
if cat in val:
return cat
return None
df['Category'] = df['Classification'].apply(get_cat)
Note: as @Justin Ezequiel pointed out in the comments, you haven't specified what to do when Mountain and City exists in the Classification.注意:正如@Justin Ezequiel 在评论中指出的那样,当分类中存在 Mountain 和 City 时,您没有指定要做什么。 Current implementation uses the first Category that matches.当前实现使用匹配的第一个类别。
You can try this:你可以试试这个:
dff={"ne":[]}
for x in df1["Classification"]:
if a in df2 and a in x:
dff["ne"].append(a)
df1["Category"]=dff["ne"]
df1
will look like your desirable output. df1
看起来像您想要的输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.