简体   繁体   English

Python通过不同长度的嵌套系列循环

[英]Python loop through nested series with different length

I am trying to make a simple program to assign codes to courses by reference to a list of keywords. 我正在尝试制作一个简单的程序,通过参考关键字列表为课程分配代码。

For now I was able to handle a keyword list where the length of keywords in each row is fixed to 2: 现在,我能够处理关键字列表,其中每行中关键字的长度固定为2:

#The list of keyword with length fixed to 2
keyword = pd.DataFrame({
        'code':['001','002','003'], 
        'keyword': [
                ['edu|teach','primary sch|secondary sch|junior sch|preliminary sch'],  # length = 2
                ['elderly|disabled|special','care'],        # length = 2
                ['digital|social media','marketing']]       # length = 2
            })

# The list of educational programmed for which codes are to be assigned
course = pd.DataFrame({
        'course': 
            ['certificate in digital marketing',
             'certificate in elderly care',
             'diploma in primary school education',
             'bachelor in traditional chinese medicine',
             'master of law']
            })

# To generate shortlist of coded courses

courseresult = pd.DataFrame()
for i in range(0,len(keyword['keyword'])):
    courseshortlist = course[
            (course.course.str.contains(keyword['keyword'][i][0]) & course.course.str.contains(keyword['keyword'][i][1])) 
           ]
    courseshortlist['autocode'] = keyword['code'][i]
    courseresult = courseresult.append(courseshortlist)

However, I am not sure how to handle looping for a keyword list with variable length like this: 但是,我不确定如何处理可变长度的关键字列表的循环,如下所示:

keyword_variable = pd.DataFrame({
        'code':['001','002','003','004','005'], 
        'keyword': [
                ['law'],                                # length = 1
                ['edu|teach','primary sch|secondary sch|junior sch|preliminary sch'], # length = 2
                ['elderly|disabled|special','care'],  # length = 2
                ['digital|social media','marketing'], # length = 2
                ['traditional','chinese','medicine']  # length = 3
                ] 
            })

Update: I just got what I want with some ugly and clumsy try and except codes: 更新:通过一些丑陋笨拙的尝试,我刚得到了我想要的东西,除了代码:

courseresult = pd.DataFrame()
for i in range(0,len(keyword_variable['keyword'])):
    try: 
        condition0 = course.course.str.contains(keyword_variable['keyword'][i][0])
        condition1 = course.course.str.contains(keyword_variable['keyword'][i][1])
        condition2 = course.course.str.contains(keyword_variable['keyword'][i][2])
        condition = condition0 & condition1 & condition2
    except IndexError: 
        try: 
            condition0 = course.course.str.contains(keyword_variable['keyword'][i][0])
            condition1 = course.course.str.contains(keyword_variable['keyword'][i][1])
            condition = condition0 & condition1 
        except IndexError: 
            condition = course.course.str.contains(keyword_variable['keyword'][i][0])
    courseshortlist = course[(condition)]
    courseshortlist['autocode'] = keyword_variable['code'][i]
    courseresult = courseresult.append(courseshortlist)

courseresult
Out[1]: 
                                     course autocode
4                             master of law      001
2       diploma in primary school education      002
1               certificate in elderly care      003
0          certificate in digital marketing      004
3  bachelor in traditional chinese medicine      005

But I am sure there must be some better way to do so? 但是我确定必须有更好的方法吗? Thanks a lot! 非常感谢!

Assuming you don't really need the result to be in a separate DataFrame: 假设您实际上并不需要将结果放在单独的DataFrame中:

for i in range(0,len(keyword_variable['keyword'])):
    condition = pd.Series([True]*len(course))
    for k in keyword_variable['keyword'][i]:
        condition = condition & course.course.str.contains(k)
    course.loc[condition, 'autocode'] = keyword_variable['code'][i]

print(course)

If you do need a new copy, just create a copy first, same solution. 如果确实需要新副本,则只需先创建一个副本即可,使用相同的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM