简体   繁体   中英

Converting a somewhat organized text file to csv using python?

I need to convert a text file to csv and organize it by columns. However, my data in the text file is by rows, and is 715 pages long. Below is an example of what the text file looks like:

This is an example of what my data looks like, but in reality there is a lot more information per person and there are thousands of entries.

图片

Basically, each row is separated by "-------". However, the data between the rows are in multiple lines. For example, there will be --- and then name and age on the next line and then salary on the line after and then another --- to signify the start of a new entry.

Is there a way where I can somehow work around this weird layout of data to end up with a csv with columns such as name, age, occupation, salary, etc. I'd be using python. I was thinking, would it be possible to maybe split it by the ---- symbol that precedes each new row? I'm not sure how to go about this though, and I am very beginner. Or if python may not be the best way to do it, what is?

You can use itertools.groupby to create subiterators that alternate between dashed lines and non-dashed lines. Now the non-dash iterators are just the blocks of text. Assuming that these fields are separated either by at least one tab character or multipel whitespace characters, a regular expression can carve them up.

import itertools

def get_my_data(filename):
    data = []
    with open(filename) as fileobj:
        for is_dash, block_iter in itertools.groupby(
                fileobj, lambda line: line.startswith("------")):
            if not is_dash:
                row = []
                for line in block_iter:
                    cols = [cell.strip() for cell in
                        re.split(r"\t+|\s{2,}", line)]
                    row.extend(cols)
                if row:
                    data.append(row)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM