简体   繁体   English

如何通过python解析CSV文件的分隔列并使它们成为CSV文件的一部分

[英]How to parse a delimited column of a CSV file and make them be part of CSV file by Python

I have a CSV file with the following format/content and would like to see an easier way to further parse the col3 content (delimited) to include them to CSV file by using python. 我有一个具有以下格式/内容的CSV文件,并且希望看到一种更简单的方法来进一步解析col3内容(定界),以使用python将其包括到CSV文件中。

I'm new to this and the simple looping approach should be working but I would like to know any easier and faster way to implement this. 我对此并不陌生,简单的循环方法应该可以使用,但是我想知道任何更简单,更快速的实现方法。

From: 从:

col1,col2,col3,col4 
1,"David","Job=Sales Manager;Hobby=reading;Sex=Male","31"
2,"Mary","Job=Nurse;Hobby=hiking;Sex=Female","23"

to: 至:

col1,col2,Job,Hobby,Sex,col4 
1,"David","Sales Manager","reading","Male","31"
2,"Mary","Nurse","hiking","Female","23"

You can use pandas library which helps deal with tabular data in a pretty easy way: 您可以使用pandas库,该库以一种非常简单的方式帮助处理表格数据:

import pandas as pd
df = pd.read_csv("xxx.csv")

new_df = pd.concat([df.drop('col3', axis=1), 
                    df.col3.apply(lambda s: pd.Series(dict(tuple(p.split('=')) for p in s.split(";"))))], 
                    axis=1)

在此处输入图片说明

To write out as .csv , simply call to_csv() : new_df.to_csv("newXXX.csv") 要将其写为.csv ,只需调用to_csv()new_df.to_csv("newXXX.csv")

This is a simple class base approach with a parse function and an output function. 这是一种简单的基于类的方法,具有解析函数和输出函数。

import csv

class Person:

    def __init__(self, string):
        self.attributes = {}
        data = string.split(",")
        self.attributes["id"] = data[0]
        self.attributes["name"] = data[1]
        self.attributes["age"] = data[3]

        self.parse_data(data[2])

    def parse_data(self, data):
        for attr in data.split(";"):
            entry = attr.split("=")
            self.attributes[entry[0]] = entry[1]

    def return_data(self):
        return ','.join(self.attributes.values())

input = '''1,"David","Job=Sales Manager;Hobby=reading;Sex=Male","31"
2,"Mary","Job=Nurse;Hobby=hiking;Sex=Female","23"'''

people = []

for line in input.split("\n"):
    person = Person(line)
    people.append(person)

print(','.join(people[0].attributes.keys())) # print the keys

for person in people:
    print(person.return_data()) # print the data

Lightweight (and relatively easy to use), I left reading an writing of csv files out of it. 轻巧(并且相对易于使用),我从中读了一段csv文件。 This will return the columns in a consistent format. 这将以一致的格式返回列。 You will notice however some punctuation that hasn't been taken care of. 但是,您会注意到一些尚未解决的标点符号。 That can be easily fixed too. 这也可以很容易地解决。

Let me know if this approach works for you. 让我知道这种方法是否适合您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM