简体   繁体   English

Python csv:将列拆分为列,然后按定界符将其拆分为行

[英]Python csv: Split column to columns and then to rows by delimiter

I have a column in a csv file which contains person's details in this format: 我在csv文件中有一列,其中包含此格式的人员详细信息:

+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  Team  |                                                                                                Members                                                                                                 |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Team 1 | OK-10:Jason:Jones:ID No:00000000:male:my notes                                                                                                                                                         |
| Team 2 | OK-10:Mike:James:ID No:00000001:male:my notes OZ-09:John:Rick:ID No:00000002:male:my notes                                                                                                             |
| Team 3 | OK-08:Michael:Knight:ID No:00000004:male:my notes2 OK-09:Helen:Rick:ID No:00000005:female:my notes3 OZ-10:Jane:James:ID No:00000034:female:my notes23 OK-09:Mary:Jane:ID No:00000023:female:my notes46 |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Actual csv format: 实际的csv格式:

"Team", "Members"                                                                                                 
 Team 1, OK-10:Jason:Jones:ID No:00000000:male:my notes                                                                                                                                                         
 Team 2, OK-10:Mike:James:ID No:00000001:male:my notes OZ-09:John:Rick:ID No:00000002:male:my notes                                                                                                             
 Team 3, OK-08:Michael:Knight:ID No:00000004:male:my notes2 OK-09:Helen:Rick:ID No:00000005:female:my notes3 OZ-10:Jane:James:ID No:00000034:female:my notes23 OK-09:Mary:Jane:ID No:00000023:female:my notes46

I want to split them in a new csv file like this: 我想将它们拆分为新的csv文件,如下所示:

+-------+-------------+-------------+----------------+------------------+---------------+---------------+--------------+
| Team  | Member_Rank | Member_Name | Member_Surname | Member_ID_Method | Member_ID_Num | Member_Gender | Member_Notes |
+-------+-------------+-------------+----------------+------------------+---------------+---------------+--------------+
| Team1 | OK-10       | Jason       | Jones          | ID No            |      00000000 | male          | my notes     |
| Team2 | OK-10       | Mike        | James          | ID No            |      00000001 | male          | my notes     |
| Team2 | OZ-09       | John        | Rick           | ID No            |      00000002 | male          | my notes     |
+-------+-------------+-------------+----------------+------------------+---------------+---------------+--------------+

Splitting details: 分割细节:

Split Row Delimiter : ' O&-' where & can be only 'K' or 'Z' 分割行分隔符: ' O&-' ,其中&只能是'K''Z'

Split Column Delimiter : ':' ,columns number in new csv file is fixed 分割列分隔符: ':' ,新的csv文件中的列号固定

(One Team can contain many members, there is no upper limit) (一个团队可以包含许多成员,没有上限)

UPDATE 更新

By using this code provided by @Adirio I get only the last member from fields with multiple members: 通过使用@Adirio提供的代码,我只能从具有多个成员的字段中获取最后一个成员:

import csv
import re


members_split_regex = re.compile(r'(O[KZ]-\d+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+)(?= O[KZ]|$)')

with open('test.csv') as input_file, open('output_csv.csv', 'w', newline='') as output_file:
    csv_reader = csv.DictReader(input_file)
    fieldnames = csv_reader.fieldnames.copy()
    fieldnames.remove('Members')
    csv_writer = csv.DictWriter(output_file, extrasaction='ignore', fieldnames=fieldnames + ['Member_Rank', 'Member_Name', 'Member_Surname', 'Member_ID_Method', 'Member_ID_Num', 'Member_Gender', 'Member_Notes'])
    csv_writer.writeheader()
    for row in csv_reader:
        for member_tuple in members_split_regex.findall(row['Members']):
                member_dict = {}
                (
                    member_dict['Member_Rank'],
                    member_dict['Member_Name'],
                    member_dict['Member_Surname'],
                    member_dict['Member_ID_Method'],
                    member_dict['Member_ID_Num'],
                    member_dict['Member_Gender'],
                    member_dict['Member_Notes']
                ) = member_tuple
                print(row['Members'])
                print(member_tuple)
                member_dict.update(row)
                csv_writer.writerow(member_dict)

print results: 打印结果:

row['Members'] -> 行['成员']->

OK-1:name1:sunrmae2:ID No:id1233123:male:note12 OK-10:name2:sunrame2:Passport No:asda3243242:female:note2 OZ-1:nma3:surname3:Passport No:asd213131:other:note 56 OK-1:名称1:sunrmae2:ID编号:id1233123:男性:note12 OK-10:名称2:sunrame2:护照编号:asda3243242:女性:note2 OZ-1:nma3:surname3:护照编号:asd213131:其他:note 56

print(member_tuple) -> 打印(member_tuple)->

('OZ-1', 'nma3', 'surname3', 'Passport No', 'asd213131', 'other', 'note 56') ('OZ-1','nma3','surname3','Passport No','asd213131','other','note 56')

Assuming this input CSV 假设此输入为CSV

Team,Members
Team 1,OK-10:Jason:Jones:ID No:00000000:male:my notes
Team 2,OK-10:Mike:James:ID No:00000001:male:my notes OZ-09:John:Rick:ID No:00000002:male:my notes
Team 3,OK-08:Michael:Knight:ID No:00000004:male:my notes2 OK-09:Helen:Rick:ID No:00000005:female:my notes3 OZ-10:Jane:James:ID No:00000034:female:my notes23 OK-09:Mary:Jane:ID No:00000023:female:my notes46

This can be achieved with regex, csv.DictReader and csv.DictWriter : 这可以通过regex, csv.DictReadercsv.DictWriter

import csv
import re

output = []

members_split_regex = re.compile(r'(O[KZ]-\d+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+)(?= O[KZ]|$)')

with open('test.csv') as f:
    csv_reader = csv.DictReader(f)
    for row in csv_reader:
        team = row['Team']
        members = row['Members']
        split_members = members_split_regex.findall(members)
        for member in split_members:
                (member_rank, member_name, member_surname, member_id_method,
                 member_id_num, member_gender, member_notes) = member

                output.append({'Team': team, 'Member_Rank': member_rank, 'Member_Name': member_name,
                               'Member_Surname': member_surname, 'Member_ID_Method': member_id_method,
                               'Member_ID_Num': member_id_num, 'Member_Gender': member_gender,
                               'Member_Notes': member_notes})

with open('output_csv', 'w', newline='') as f:
    csv_writer = csv.DictWriter(f, fieldnames=['Team', 'Member_Rank', 'Member_Name', 'Member_Surname', 'Member_ID_Method', 'Member_ID_Num', 'Member_Gender', 'Member_Notes'])
    csv_writer.writeheader()
    csv_writer.writerows(output)

Output file is 输出文件是

Team,Member_Rank,Member_Name,Member_Surname,Member_ID_Method,Member_ID_Num,Member_Gender,Member_Notes
Team 1,OK-10,Jason,Jones,ID No,00000000,male,my notes
Team 2,OK-10,Mike,James,ID No,00000001,male,my notes 
Team 2,OZ-09,John,Rick,ID No,00000002,male,my notes
Team 3,OK-08,Michael,Knight,ID No,00000004,male,my notes2 
Team 3,OK-09,Helen,Rick,ID No,00000005,female,my notes3 
Team 3,OZ-10,Jane,James,ID No,00000034,female,my notes23 
Team 3,OK-09,Mary,Jane,ID No,00000023,female,my notes46

Based on @DeepSpace answer but with a fixed regex and new requirements added: 基于@DeepSpace答案,但具有固定的正则表达式和新要求:

import csv
import re


members_split_regex = re.compile(r'(O[KZ]-\d+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+):([a-zA-Z0-9 ]+)(?= O[KZ]|$)')

with open('test.csv') as input_file, open('output_csv', 'w', newline='') as output_file:
    csv_reader = csv.DictReader(input_file)
    fieldnames = csv_reader.fieldnames.copy()
    fieldnames.remove('Members')
    csv_writer = csv.DictWriter(output_file, extrasaction='ignore', fieldnames=fieldnames + ['Member_Rank', 'Member_Name', 'Member_Surname', 'Member_ID_Method', 'Member_ID_Num', 'Member_Gender', 'Member_Notes'])
    csv_writer.writeheader()
    for row in csv_reader:
        for member_tuple in members_split_regex.findall(row['Members']):
            member_dict = {}
            (
                member_dict['Member_Rank'],
                member_dict['Member_Name'],
                member_dict['Member_Surname'],
                member_dict['Member_ID_Method'],
                member_dict['Member_ID_Num'],
                member_dict['Member_Gender'],
                member_dict['Member_Notes']
            ) = member_tuple
            member_dict.update(row)
            csv_writer.writerow(member_dict)

The main difference is that I'm deleting the column from the dictionary so that we can use it to update our new dictionary. 主要区别在于,我将从字典中删除该列,以便我们可以使用它来更新新字典。 This way we do not only copy the "Team" column but any other column that is not "Members". 这样,我们不仅可以复制“团队”列,还可以复制任何其他非“成员”列。 To do so the fieldnames of the reader are also copied, the "Members" item removed, and the new ones added to the fieldnames of the writter. 为此,还将复制阅读器的字段名,删除“成员”项,并将新的字段名添加到写作者的字段名中。

The used regex doesn't hardcode any field, allows spaces in names and surnames, capital Os in the notes, and ID fields that are not just 8-digit numbers. 使用的正则表达式不对任何字段进行硬编码,而是在名称和姓氏中使用空格,在注释中使用大写O以及不只是8位数字的ID字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM