简体   繁体   中英

Read a csv file with multiple data sections into an addressable structure

I have made a csv file which looks like this:

csv档案图片

Now, in my Python file I want it to take the data from food field place column, which is only:

a
b
c
d
e

Then I want it to take from drink field only the data from taste and so on.

My question is: How do I make a database that will have like "fields" (IE: food/drinks) and inside each field address the specific cells I described?

This question is pretty wide open, so I will show two possible ways to parse this data into a structure that can be accessed in the manner you described.


Solution #1

This code uses a bit more advanced python and libraries. It uses a generator around a csv reader to allow the multiple sections of the data to be read efficiently. The data is then placed into a pandas.DataFrame per section. And each data frame is accessible in a dict.

The data can be accessed like:

ratings['food']['taste']

This will give a pandas.Series . A regular python list can be had with:

list(ratings['food']['taste'])

Code to read data to Pandas Dataframe using a generator:

def csv_record_reader(csv_reader):
    """ Read a csv reader iterator until a blank line is found. """
    prev_row_blank = True
    for row in csv_reader:
        row_blank = (row[0] == '')
        if not row_blank:
            yield row
            prev_row_blank = False
        elif not prev_row_blank:
            return

ratings = {}
ratings_reader = csv.reader(my_csv_data)
while True:
    category_row = list(csv_record_reader(ratings_reader))
    if len(category_row) == 0:
        break
    category = category_row[0][0]

    # get the generator for the data section
    data_generator = csv_record_reader(ratings_reader)

    # first row of data is the column names
    columns = next(data_generator)

    # use the rest of the data to build a data frame
    ratings[category] = pd.DataFrame(data_generator, columns=columns)

Solution #2

Here is a solution to read the data to a dict . The data can be accessed with something like:

ratings['food']['taste']

Code to read CSV to dict:

from collections import namedtuple

ratings_reader = csv.reader(my_csv_data)
ratings = {}
need_category = True
need_header = True
for row in ratings_reader:
    if row[0] == '':
        if not (need_category or need_header):
            # this is the end of a data set
            need_category = True
            need_header = True

    elif need_category:
        # read the category (food, drink, ...)
        category = ratings[row[0]] = dict(rows=[])
        need_category = False

    elif need_header:
        # read the header (place, taste, ...)
        for key in row:
            category[key] = []
        DataEnum = namedtuple('DataEnum', row)
        need_header = False

    else:
        # read a row of data
        row_data = DataEnum(*row)
        category['rows'].append(row_data)
        for k, v in row_data._asdict().items():
            category[k].append(v)

Test Data:

my_csv_data = [x.strip() for x in """
    food,,
    ,,
    place,taste,day
    a,good,1
    b,good,2
    c,awesome,3
    d,nice,4
    e,ok,5
    ,,
    ,,
    ,,
    drink,,
    ,,
    place,taste,day
    a,good,1
    b,good,2
    c,awesome,3
    d,nice,4
    e,ok,5
""".split('\n')[1:-1]]

To read the data from a file:

with open('ratings_file.csv', 'rb') as ratings_file: 
    ratings_reader = csv.reader(ratings_file)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM