[英]How to create columns in pandas df with .apply and user defined function
I'm trying to create several columns in a pandas DataFrame at once, where each column name is a key in a dictionary and the function returns 1 if any of the values corresponding to that key are present.我试图一次在 Pandas DataFrame 中创建几个列,其中每个列名是字典中的一个键,如果存在与该键对应的任何值,则该函数返回 1。
My DataFrame has 3 columns, jp_ref, jp_title, and jp_description.我的 DataFrame 有 3 列,jp_ref、jp_title 和 jp_description。 Essentially, I'm searching the jp_descriptions for relevant words assigned to that key and populating the column assigned to that key with 1s and 0s based on if any of the values are found present in the jp_description.
本质上,我正在 jp_descriptions 中搜索分配给该键的相关单词,并根据 jp_description 中是否存在任何值,用 1 和 0 填充分配给该键的列。
jp_tile = [‘software developer’, ‘operations analyst’, ‘it project manager’]
jp_ref = [‘j01’, ‘j02’, ‘j03’]
jp_description = [‘software developer with java and sql experience’, ‘operations analyst with ms in operations research, statistics or related field. sql experience desired.’, ‘it project manager with javascript working knowledge’]
myDict = {‘jp_title’:jp_title, ‘jp_ref’:jp_ref, ‘jp_description’:jp_description}
data = pd.DataFrame(myDict)
technologies = {'java':['java','jdbc','jms','jconsole','jprobe','jax','jax-rs','kotlin','jdk'],
'javascript':['javascript','js','node','node.js','mustache.js','handlebar.js','express','angular'
'angular.js','react.js','angularjs','jquery','backbone.js','d3'],
'sql':['sql','mysql','sqlite','t-sql','postgre','postgresql','db','etl']}
def term_search(doc,tech):
for term in technologies[tech]:
if term in doc:
return 1
else:
return 0
for tech in technologies:
data[tech] = data.apply(term_search(data['jp_description'],tech))
I received the following error but don't understand it:我收到以下错误但不明白:
TypeError: ("'int' object is not callable", 'occurred at index jp_ref')
Your logic is wrong you are traversing list in a loop and after first iteration it return 0 or 1 so jp_description
value is never compared with complete list.您的逻辑是错误的,您在循环中遍历列表,并且在第一次迭代后返回 0 或 1,因此
jp_description
值永远不会与完整列表进行比较。
You split the jp_description and check the common elements with technology dict if common elements exists it means substring is found so return 1 else 0您拆分 jp_description 并使用 technology dict 检查公共元素,如果公共元素存在,则表示找到子字符串,因此返回 1 else 0
def term_search(doc,tech):
doc = doc.split(" ")
common_elem = list(set(doc).intersection(technologies[tech]))
if len(common_elem)>0:
return 1
return 0
for tech in technologies:
data[tech] = data['jp_description'].apply(lambda x : term_search(x,tech))
jp_title jp_ref jp_description java javascript sql
0 software developer j01 software developer.... 1 0 1
1 operations analyst j02 operations analyst .. 0 0 1
2 it project manager j03 it project manager... 0 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.