简体   繁体   English

如何检查 dataframe 列是否包含来自另一个 dataframe 列的字符串并返回 python Z3A0524F883225EFFA94 中的相邻单元格

[英]How do I check if dataframe column contains a string from another dataframe column and return adjacent cell in python pandas?

I have 2 dataframes, one containing a columnn of strings (df = data) which I need to categorise, and the other containing possible categories and search terms (df = categories).我有 2 个数据框,一个包含我需要分类的一列字符串(df = 数据),另一个包含可能的类别和搜索词(df = 类别)。 I would like to add a column to the "data" dataframe which returns a category based on search terms.我想在“数据”dataframe 中添加一列,它会根据搜索词返回一个类别。 For example:例如:

data:数据:

**RepairName**
A/C is not cold
flat tyre is c
the tyre needs a repair on left side
the aircon is not cold

categories:类别:

**Category**      **SearchTerm**
A/C               aircon
A/C               A/C
Tyre              repair
Tyre              flat

DESIRED RESULT data:期望的结果数据:

**RepairName**                        **Category**
A/C is not cold                         A/C
flat tyre is c                          Tyre
the tyre needs a repair on left side    Tyre
the aircon is not cold                  A/C

I have tried the following lambda function with apply.我已经尝试了以下 lambda function 与应用。 I am not sure if my column references are in the correct place:我不确定我的列引用是否在正确的位置:

data['Category'] = data['RepairName'].apply(lambda x: categories['Category'] if categories['SearchTerm'] in x else "")
data['Category'] = [categories['Category'] if categories['SearchTerm'] in data['RepairName'] else 0]

but I keep getting the error messge:但我不断收到错误消息:

TypeError: 'in <string>' requires string as left operand, not Series

This provides true / false as to whether a category exists based on SearchTerm, however I have not been able to return the category associated with the Search Term:这提供了基于 SearchTerm 的类别是否存在的真/假,但是我无法返回与搜索词关联的类别:

data['containName']=data['RepairName'].str.contains('|'.join(categories['SearchTerm']),case=False)

And these both sometimes work, but not all the time (perhaps because some of my search terms are more than one word?)这两者有时都有效,但并非一直有效(也许是因为我的某些搜索词不止一个词?)

data['Category'] = [
    next((c for c, k in categories.values if k in s), None) for s in data['RepairName']] 

d = dict(zip(categories['SearchTerm'], categories['Category']))
data['CategoryCheck'] = [next((d[y] for y in x.split() if y in d), None) for x in data['RepairName']]

We do str.findall then map我们先做str.findall然后map

s=df.RepairName.str.findall('|'.join(cat.SearchTerm.tolist())).str[0].\
    map(cat.set_index('SearchTerm').Category)
0     A/C
1    Tyre
2    Tyre
3     A/C
Name: RepairName, dtype: object
df['Category']=s

This worked once I had ensured all my columns were lower case (I also removed hyphens and brackets as well for good measure):一旦我确保我的所有列都是小写的(我还删除了连字符和括号以更好地衡量),这就会起作用:

print("All lowercase")
data = data.apply(lambda x: x.astype(str).str.lower())
categories = categories.apply(lambda x: x.astype(str).str.lower())

print("Remove double spacing")
data = data.replace('\s+', ' ', regex=True)

print('Remove hyphens')
data["RepairName"] = data["RepairName"].str.replace('-', '')

print('Remove brackets')
data["RepairName"] = data["RepairName"].str.replace('(', '')
data["RepairName"] = data["RepairName"].str.replace(')', '')

data['Category'] = [
    next((c for c, k in categories.values if k in s), None) for s in data['RepairName']]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas dataframe 检查列是否包含存在于另一列中的字符串 - pandas dataframe check if column contains string that exists in another column 检查一列是否包含 pandas dataframe 中另一列的单词 - Check if a column contains words from another column in pandas dataframe 熊猫-检查一个数据帧中的字符串列是否包含来自另一个数据帧的一对字符串 - Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe 检查字符串是否包含pandas dataframe中同一列的子字符串 - check if string contains sub string from the same column in pandas dataframe 列表的Python Pandas Dataframe检查列,并从另一个Dataframe返回ID - Python Pandas Dataframe check column of lists and return ID from another Dataframe 如果一列的字符串包含 pandas dataframe 中另一列的单词,如何删除整行 - How to drop entire row if string of one column contains the word from another column in pandas dataframe Python Pandas - 无法识别另一个数据帧列中的列的字符串 - Python Pandas - Cannot recognize a string from a column in another dataframe column 如何使用DataFrame和Pandas检查列中的字符串是否是另一列中的子字符串 - How can I check if a string in a column is a sub-string in another column using dataframe and pandas 如何检查 dataframe 中列名称为 Name 的特定股票代码字符串,如果在 dataframe 中找到,则返回? - How do I check for a specific ticker string inside my dataframe with a column name of Name and return if it is found in the dataframe? 如何在 Pandas 数据框的列中添加空白单元格? - How do I add a blank cell inside a column of a Pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM