简体   繁体   中英

Create a dictionary using the row number in a csv file [Python]

I have a CSV file containing survey data on 60 participants. The first column is the participant's number, and for each number corresponds all the data collected from that participants. It looks something like:

Participant number: 1, Gender: Female, Level of study: Postgrad

I would like to create a dictionary where the key is the Participant Number and the value is the whole of the row with all the data, to have something like this:

{1: Female, Postgrad, American, Yes, No, No, Yes, Yes, No...} and so on. I am still a newbie and so far this is what I tried:

with open('surveys.csv', 'r') as f:
    reader = csv.reader(f, delimiter=' ')
    with open('new_surveys.csv', mode='w') as outfile:
            writer = csv.writer(outfile)
            mydict = {rows[0]:rows for rows in reader}
            print(mydict)

But this prints something like:

{'\"': ['\"'], 'Participant/Question","1.': ['Participant/Question","1.', 'Gender'], ',2.': [',2.', 'Level', 'of', 'study'],} which does not make any sense to me at the moment...

Thank you!

Edit:

This is one complete row of data:

One complete row of data, there's 59 more but they all look the same only difference is Yes/No or time of day

You can try this?

import csv
with open('surveys.csv', 'r') as f:
    reader = csv.reader(f, delimiter=' ') 
    mydict={}
    iterreader = iter(reader)
    next(iterreader)
    for row in iterreader:
        elementsList=row[0].split("\t")
        nonEmptyElements=[]
        for element in elementsList[1:]:
            print(element)          
            if(not element.strip()==""):
                nonEmptyElements.append(element)
        valuesList=",".join(nonEmptyElements)
        mydict[elementsList[0]]=valuesList
print(mydict)  

My CSV looks like this

Participant Name    Gender
1   Rupin   Male
2   Poonam  Female
3   Jeshan  Male

The code avoids using the first row.

My output looks like this

{'1': 'Rupin,Male', '2': 'Poonam,Female', '3': 'Jeshan,Male'}

From the comments , we know the first 100 bytes of the raw file are:

b'\xef\xbb\xbf"\nParticipant/Question","1. Gender\n","2. Level of study\n","3. How often visit SC\n","4. Time of vi' 

This looks like a csv export from Excel, with embedded newlines in the cells. The initial b'\\xef\\xbb\\xbf' is a byte order mark, indicating that the bytes are encoded as 'utf-8-sig'.

Based on this information, this code should create the desired dictionary:

with open('surveys.csv', 'r', encoding='utf-8-sig') as f:
    reader = csv.reader(f, dialect='excel')
    # Advance the iterator to skip the header row
    next(reader)
    mydict = {row[0]:row for row in reader}
print(mydict)

Passing the 'utf-8-sig' encoding ensures that the byte-order-mark doesn't get treated as part of the data. It's probably a good idea to set this encoding when reading and writing csv files if you are working with Excel.

Passing dialect='excel' to the reader tells it to use the defaults associated with csv files created by Excel, such as using a comma as the delimiter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM