简体   繁体   English

如何使用 .apply 和用户定义函数在 Pandas df 中创建列

[英]How to create columns in pandas df with .apply and user defined function

I'm trying to create several columns in a pandas DataFrame at once, where each column name is a key in a dictionary and the function returns 1 if any of the values corresponding to that key are present.我试图一次在 Pandas DataFrame 中创建几个列,其中每个列名是字典中的一个键,如果存在与该键对应的任何值,则该函数返回 1。

My DataFrame has 3 columns, jp_ref, jp_title, and jp_description.我的 DataFrame 有 3 列,jp_ref、jp_title 和 jp_description。 Essentially, I'm searching the jp_descriptions for relevant words assigned to that key and populating the column assigned to that key with 1s and 0s based on if any of the values are found present in the jp_description.本质上,我正在 jp_descriptions 中搜索分配给该键的相关单词,并根据 jp_description 中是否存在任何值,用 1 和 0 填充分配给该键的列。


jp_tile = [‘software developer’, ‘operations analyst’, ‘it project manager’]

jp_ref = [‘j01’, ‘j02’, ‘j03’]

jp_description = [‘software developer with java and sql experience’, ‘operations analyst with ms in operations research, statistics or related field. sql experience desired.’, ‘it project manager with javascript working knowledge’]

myDict = {‘jp_title’:jp_title, ‘jp_ref’:jp_ref, ‘jp_description’:jp_description}

data = pd.DataFrame(myDict)

technologies = {'java':['java','jdbc','jms','jconsole','jprobe','jax','jax-rs','kotlin','jdk'],
'javascript':['javascript','js','node','node.js','mustache.js','handlebar.js','express','angular'
             'angular.js','react.js','angularjs','jquery','backbone.js','d3'],
'sql':['sql','mysql','sqlite','t-sql','postgre','postgresql','db','etl']}

def term_search(doc,tech):
    for term in technologies[tech]:
        if term in doc:
            return 1
        else:
            return 0

for tech in technologies:
    data[tech] = data.apply(term_search(data['jp_description'],tech))

I received the following error but don't understand it:我收到以下错误但不明白:

TypeError: ("'int' object is not callable", 'occurred at index jp_ref')

Your logic is wrong you are traversing list in a loop and after first iteration it return 0 or 1 so jp_description value is never compared with complete list.您的逻辑是错误的,您在循环中遍历列表,并且在第一次迭代后返回 0 或 1,因此jp_description值永远不会与完整列表进行比较。

You split the jp_description and check the common elements with technology dict if common elements exists it means substring is found so return 1 else 0您拆分 jp_description 并使用 technology dict 检查公共元素,如果公共元素存在,则表示找到子字符串,因此返回 1 else 0

def term_search(doc,tech):
    doc = doc.split(" ")
    common_elem = list(set(doc).intersection(technologies[tech]))
    if len(common_elem)>0:
        return 1
    return 0       

for tech in technologies:
    data[tech] = data['jp_description'].apply(lambda x : term_search(x,tech))
     jp_title          jp_ref  jp_description   java    javascript  sql
0   software developer  j01 software developer....  1          0        1
1   operations analyst  j02 operations analyst ..   0          0        1
2   it project manager  j03 it project manager...   0          1        0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在创建附加列时将 function 应用于整个 pandas df? - How to apply a function to an entire pandas df in creating additional columns? pandas 将用户定义的 function 应用于多列上的分组 dataframe - pandas apply User defined function to grouped dataframe on multiple columns 创建 lambda function 应用于 select df 列 - Create lambda function to apply to select df columns 在 Pandas groupby 模式下使用用户定义的函数,将其应用于多列并将结果分配给新的 Pandas 列 - In pandas groupby mode use user defined function, apply it to multiple columns and assign the results to new pandas columns 如何将函数应用于多列以在Pandas中创建多列? - How to apply a function to multiple columns to create multiple columns in Pandas? "如何将用户定义的函数应用于熊猫数据框中的列?" - How to apply a user-defined function to a column in pandas dataframe? 如何在 pandas 中的分组数据上按列应用用户定义的 function - how to apply a user defined function column wise on grouped data in pandas C 应用用户定义的 function 到 pandas dataframe 特定列并将新列添加到 Z6A8064B53DF47945555707 - apply user defined function to pandas dataframe specific columns and add new columns to dataframe 在df.apply()中的自定义函数中传递Pandas DataFrame中的不同列 - Pass Different Columns in Pandas DataFrame in a Custom Function in df.apply() 将用户定义的 function 应用于 pandas 中的 groupby - apply a user defined function to a groupby in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM