简体   繁体   English

如何使用python / panda将字符串拆分为列名?

[英]How to split string into column names with python/panda?

Do you know how to solve this in python? 你知道如何在python中解决这个问题吗? I would like to have a dataframe with data arranged in the correct column. 我想有一个数据框,数据排列在正确的列中。

Thanks in advance! 提前致谢!

Here is an example of a string from a dataframe. 这是来自数据帧的字符串的示例。

' Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard ' 'Huidigefuncties迈克尔乔丹2015年 - 现任市场营销和间接渠道总监,理光Nederland 2010年至今篮球中心,商业中心专家Loopbaan迈克尔乔丹2012年 - 2015年理光市场营销与业务发展总监Opleiding迈克尔乔丹1988年至1992年市场营销,哈佛大学

Preferred result 首选结果

type          from     to        function                                   organization           
current       2015     present    Director Marketing & Indirect Channels    Ricoh Nederland 
current       2010     present    Owner & Consultant                        Basketball Center
old           2012     2015       Director Marketing & Business Development Ricoh
school        1988     1992       Marketing                                 Harvard                           

Current df 目前的df

Name             Data
Michael Jordan   ' Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard '

Well, this is a solution that I did for this problem 嗯,这是我为这个问题做的解决方案

import pandas as pd
beautiful_data = 'Huidigefuncties Michael Jordan 2015 - present Director Marketing & Indirect Channels, Ricoh Nederland 2010 - present Basketball Center, Center for Business-Expertise Loopbaan Michael Jordan 2012 - 2015 Director Marketing & Business Development, Ricoh Opleiding Michael Jordan 1988 - 1992 Marketing , Harvard'
main_dict = {'type':[], 'from':[], 'to':[], 'function':[], 'organization': []}
data = beautiful_data.split(' ')
i = 0
huidi_index = data.index('Huidigefuncties')
loopbaan_index = data.index('Loopbaan')
ople_index = data.index('Opleiding')
# print(data)
while i < len(data):
    if data[i] == 'Huidigefuncties':
        line = ' '.join(data[i + 1: loopbaan_index])
        i = loopbaan_index
        print(line)
        type_data = 'current'
    elif data[i] == 'Loopbaan':
        line = ' '.join(data[i + 1: ople_index])
        i = ople_index
        print(line)
        type_data = 'old'
    elif data[i] == 'Opleiding':
        line = ' '.join(data[i+1: ])
        i = len(data)
        print(line)
        type_data = 'school'
    else:
        i += 1
    data_line = line.split('-')
    if len(data_line) == 2:
        print(type_data)
        main_dict['type'].append(type_data)
        from_data = data_line[0].strip().split(' ')[-1]
        print(from_data)
        main_dict['from'].append(from_data)
        to_data = data_line[1].strip().split(' ')[0]
        print(to_data)
        main_dict['to'].append(to_data)
        function_data = ' '.join(data_line[1].strip().split(' ')[1:-1])[:-1]
        print(function_data)
        main_dict['function'].append(function_data)
        organization_data = data_line[1].split(',')[-1].strip()
        print(organization_data)
        main_dict['organization'].append(organization_data)

    elif len(data_line) > 2:
        j = 0
        while j < len(data_line):
            register_data = data_line[j:j+2]
            if len(register_data) > 1:
                if len(register_data[0].split(' ')) > 1 and len(register_data[1].split(' ')) > 1: 
                    if j == 0:
                        print(register_data)
                        print('----------')
                        print(type_data)
                        main_dict['type'].append(type_data)
                        from_data = register_data[0].strip().split(' ')[-1]
                        print(from_data)
                        main_dict['from'].append(from_data)
                        to_data = register_data[1].strip().split(' ')[0]
                        print(to_data)
                        main_dict['to'].append(to_data)
                        function_org = register_data[1].strip().split(',')
                        function_data = ' '.join(function_org[0].split(' ')[1:])
                        print(function_data)
                        main_dict['function'].append(function_data)
                        org_data = ' '.join(function_org[1].split(' ')[:-1]).strip()
                        print(org_data)
                        main_dict['organization'].append(org_data)
                        print('-----------')
                    else:
                        print('-----------')
                        print(register_data)
                        print(type_data)
                        main_dict['type'].append(type_data)
                        from_data = register_data[0].strip().split(' ')[-1]
                        print(from_data)
                        main_dict['from'].append(from_data)
                        to_data = register_data[1].strip().split(' ')[0]
                        print(to_data)
                        main_dict['to'].append(to_data)
                        function_org = register_data[1].strip().split(',')
                        function_data = ' '.join(function_org[0].split(' ')[1:])
                        print(function_data)
                        main_dict['function'].append(function_data)
                        org_data = ' '.join(function_org[1].split(' ')).strip()
                        print(org_data)
                        main_dict['organization'].append(org_data)
                        print('-----------')
            j += 1

df = pd.DataFrame(main_dict)

Tested 经测试

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM