[英]Python loop through nested series with different length
I am trying to make a simple program to assign codes to courses by reference to a list of keywords. 我正在尝试制作一个简单的程序,通过参考关键字列表为课程分配代码。
For now I was able to handle a keyword list where the length of keywords in each row is fixed to 2: 现在,我能够处理关键字列表,其中每行中关键字的长度固定为2:
#The list of keyword with length fixed to 2
keyword = pd.DataFrame({
'code':['001','002','003'],
'keyword': [
['edu|teach','primary sch|secondary sch|junior sch|preliminary sch'], # length = 2
['elderly|disabled|special','care'], # length = 2
['digital|social media','marketing']] # length = 2
})
# The list of educational programmed for which codes are to be assigned
course = pd.DataFrame({
'course':
['certificate in digital marketing',
'certificate in elderly care',
'diploma in primary school education',
'bachelor in traditional chinese medicine',
'master of law']
})
# To generate shortlist of coded courses
courseresult = pd.DataFrame()
for i in range(0,len(keyword['keyword'])):
courseshortlist = course[
(course.course.str.contains(keyword['keyword'][i][0]) & course.course.str.contains(keyword['keyword'][i][1]))
]
courseshortlist['autocode'] = keyword['code'][i]
courseresult = courseresult.append(courseshortlist)
However, I am not sure how to handle looping for a keyword list with variable length like this: 但是,我不确定如何处理可变长度的关键字列表的循环,如下所示:
keyword_variable = pd.DataFrame({
'code':['001','002','003','004','005'],
'keyword': [
['law'], # length = 1
['edu|teach','primary sch|secondary sch|junior sch|preliminary sch'], # length = 2
['elderly|disabled|special','care'], # length = 2
['digital|social media','marketing'], # length = 2
['traditional','chinese','medicine'] # length = 3
]
})
Update: I just got what I want with some ugly and clumsy try and except codes: 更新:通过一些丑陋笨拙的尝试,我刚得到了我想要的东西,除了代码:
courseresult = pd.DataFrame()
for i in range(0,len(keyword_variable['keyword'])):
try:
condition0 = course.course.str.contains(keyword_variable['keyword'][i][0])
condition1 = course.course.str.contains(keyword_variable['keyword'][i][1])
condition2 = course.course.str.contains(keyword_variable['keyword'][i][2])
condition = condition0 & condition1 & condition2
except IndexError:
try:
condition0 = course.course.str.contains(keyword_variable['keyword'][i][0])
condition1 = course.course.str.contains(keyword_variable['keyword'][i][1])
condition = condition0 & condition1
except IndexError:
condition = course.course.str.contains(keyword_variable['keyword'][i][0])
courseshortlist = course[(condition)]
courseshortlist['autocode'] = keyword_variable['code'][i]
courseresult = courseresult.append(courseshortlist)
courseresult
Out[1]:
course autocode
4 master of law 001
2 diploma in primary school education 002
1 certificate in elderly care 003
0 certificate in digital marketing 004
3 bachelor in traditional chinese medicine 005
But I am sure there must be some better way to do so? 但是我确定必须有更好的方法吗? Thanks a lot! 非常感谢!
Assuming you don't really need the result to be in a separate DataFrame: 假设您实际上并不需要将结果放在单独的DataFrame中:
for i in range(0,len(keyword_variable['keyword'])):
condition = pd.Series([True]*len(course))
for k in keyword_variable['keyword'][i]:
condition = condition & course.course.str.contains(k)
course.loc[condition, 'autocode'] = keyword_variable['code'][i]
print(course)
If you do need a new copy, just create a copy first, same solution. 如果确实需要新副本,则只需先创建一个副本即可,使用相同的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.