简体   繁体   English

如果这些值的一部分在 pandas 的预定义列表中,则替换列中的值的有效方法

[英]Efficient way to replace values in column if part of those values are in predefined lists in pandas

So I've actually solved this, but the way I'm doing it may not be the most efficient.所以我实际上已经解决了这个问题,但我这样做的方式可能不是最有效的。

For a column in my database - Industry - I want to replace values.对于我的数据库中的一列 - Industry - 我想替换值。 If a value contains the word "tech", "technology" or something similar, I want to replace that value with just the word "technology".如果一个值包含“技术”、“技术”或类似的词,我想用“技术”这个词替换那个值。

I've followed a basic algorithm below using apply which basically loops through a predefined list (eg science ) and checks whether any of the values are present in the current Industry cell, and replaces if they are.我使用apply遵循下面的基本算法,它基本上循环通过预定义的列表(例如science )并检查当前Industry单元格中是否存在任何值,如果存在则替换它们。

It then does the same for the next list.然后它对下一个列表执行相同的操作。 I only have two lists so far, but I'll likely have over a dozen once I'm finished.到目前为止,我只有两个列表,但一旦完成,我可能会有十几个。

def industry_convert(row):
    
    science = ["research", "science", "scientific", "scientist", "academia", "education", "academic"]
    tech = ["technology", "tech", "software"]

    for v in science:
        if v.lower() in row.Industry.lower():
            row.Industry = "Research, Science, & Education"
            
    for v in tech:
        if v.lower() in row.Industry.lower():
            row.Industry = "Technology"
            
    return row

df = df.apply(industry_convert, axis = 1)

I'm just wondering if this is the best way to do this, or if there is a more pythonic or pandas way of doing it?我只是想知道这是否是最好的方法,或者是否有更pythonicpandas的方法?

EDIT:编辑:

This is what some of the Industry column looks like:这是一些行业专栏的样子:

Industry
Research Scientist
Science: Education
Tech
Technical Assistance
Technology
Medical
Hospitality

This what it would look like after applying the code:这是应用代码后的样子:

Industry            
Research, Science, & Education
Research, Science, & Education
Technology
Technology
Technology
Medical
Hospitality

Tell me if this works i updated the for loop, in your function告诉我这是否可行,我在您的 function 中更新了 for 循环

science = list(map(lambda x:x.lower(),["research", "science", "scientific", "scientist", "academia", "education", "academic"]))
tech = list(map(lambda x:x.lower(),["technology", "tech", "software"]))
def industry_convert(row):
    global science,tech
    


  
     if row.Industry.lower() in science:
          row.Industry = "Research, Science, & Education"
            
    
     if row.Industry.lower() in science:
          row.Industry = "Technology"
            
    return row

df = df.apply(industry_convert, axis = 1)

I computed the lists to lower only once, so that it is not recomputed and the computation of the for loop is saved Hope it works Happy coding ^-^我计算的列表只降低了一次,因此它不会被重新计算并且for循环的计算被保存希望它工作快乐编码^-^

Personally, I would use str.contains and .loc to assign new values.就个人而言,我会使用str.contains.loc来分配新值。

this will work a number of times faster than looping over each row individually to check.这将比单独循环检查每一行快很多倍。 (Which is an anti pattern with regards to the pandas API) (这是关于 pandas API 的反模式)

science = ["research", "science", "scientific", "scientist", "academia", "education", "academic"]
tech = ["technology", "tech", "software"]

df.loc[df['Industry'].str.contains(f"{'|'.join(science)}",regex=True,case=False),
                         'industry_new'] = "Research, Science, & Education"

df.loc[df['Industry'].str.contains(f"{'|'.join(tech)}",regex=True,case=False),
                         'industry_new'] = "Technology"


df['industry_new'] = df['industry_new'].fillna(df['Industry'])  

print(df)

               Industry                    industry_new
0    Research Scientist  Research, Science, & Education
1    Science: Education  Research, Science, & Education
2                  Tech                      Technology
3  Technical Assistance                      Technology
4            Technology                      Technology
5               Medical                         Medical
6           Hospitality                     Hospitality

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM