简体   繁体   中英

Parsing a txt file into a dictionary to write to csv file

Eprime outputs a .txt file like this:

*** Header Start ***
VersionPersist: 1
LevelName: Session
Subject: 7
Session: 1
RandomSeed: -1983293234
Group: 1
Display.RefreshRate: 59.654
*** Header End ***
    Level: 2
    *** LogFrame Start ***
    MeansEffectBias: 7
    Procedure: trialProc
    itemID: 7
    bias1Answer: 1
    *** LogFrame End ***
    Level: 2
    *** LogFrame Start ***
    MeansEffectBias: 2
    Procedure: trialProc
    itemID: 2
    bias1Answer: 0

I want to parse this and write it to a .csv file but with a number of lines deleted.

I tried to create a dictionary that took the text appearing before the colon as the key and the text after as the value:

{subject: [7, 7], bias1Answer : [1, 0], itemID: [7, 2]}
def load_data(filename):
    data = {}
    eprime = open(filename, 'r')
    for line in eprime:
        rows = re.sub('\s+', ' ', line).strip().split(':')
        try:
            data[rows[0]] += rows[1]
        except KeyError:
            data[rows[0]] = rows[1]
    eprime.close()
    return data
for line in open(fileName, 'r'):
    if ':' in line:
        row = line.strip().split(':')
        fullDict[row[0]] = row[1]
print fullDict

both of the scripts below produce garbage:

{'\x00\t\x00M\x00e\x00a\x00n\x00s\x00E\x00f\x00f\x00e\x00c\x00t\x00B\x00i\x00a\x00s\x00': '\x00 \x005\x00\r\x00', '\x00\t\x00B\x00i\x00a\x00s\x002\x00Q\x00.\x00D\x00u\x00r\x00a\x00t\x00i\x00o\x00n\x00E\x00r\x00r\x00o\x00r\x00': '\x00 \x00-\x009\x009\x009\x009\x009\x009\x00\r\x00'

If I could set up the dictionary, I can write it to a csv file that would look like this!!:

Subject  itemID ... bias1Answer 
  7       7             1
  7       2             0

You don't need to create dictionary.

import codecs
import csv

with codecs.open('eprime.txt', encoding='utf-16') as f, open('output.csv', 'w') as fout:
    writer = csv.writer(fout, delimiter='\t')
    writer.writerow(['Subject', 'itemID', 'bias1Answer'])
    for line in f:
        if ':' in line:
            value = line.split()[-1]

        if 'Subject:' in line:
            subject = value
        elif 'itemID:' in line:
            itemID = value
        elif 'bias1Answer:' in line:
            bias1Answer = value
            writer.writerow([subject, itemID, bias1Answer])

Your second approach would work but value for each dictionary key should be a list. Currently for each key in the dictionary you are storing only one value as a result of which only the last value is getting stored. You can modify your code so that value for each key is a list. The below code would achieve same:

for line in open(fileName, 'r'):
    if ':' in line:
        row = line.strip().split(':')
        # Use row[0] as a key, initiate its value
        # to be a list and add row[1] to the list. 
        # In case already a key 'row[0]'
        # exists append row[1] to the existing value list
        fullDict.setdefault(row[0],[]).append(row[1])
print fullDict 

Seems like Eprime outputs is encoded with utf-16..

>>> print '\x00\t\x00M\x00e\x00a\x00n\x00s\x00E\x00f\x00f\x00e\x00c\x00t\x00B\x00i\x00a\x00s\x00'.decode('utf-16-be')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_16_be.py", line 16, in decode
    return codecs.utf_16_be_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 32: truncated data
>>> print '\x00\t\x00M\x00e\x00a\x00n\x00s\x00E\x00f\x00f\x00e\x00c\x00t\x00B\x00i\x00a\x00s\x00'.decode('utf-16-be', 'ignore')
    MeansEffectBias

I know this is an older question so maybe you have long since solved it but I think you are approaching this in a more complex way than is needed. I figure I'll respond in case someone else has the same problem and finds this.

If you are doing things this way because you do not have a software key, it might help to know that the E-Merge and E-DataAid programs for eprime don't require a key. You only need the key for editing build files. Whoever provided you with the .txt files should probably have an install disk for these programs. If not, it is available on the PST website (I believe you need a serial code to create an account, but not certain)

Eprime generally creates a .edat file that matches the content of the text file you have posted an example of. Sometimes though if eprime crashes you don't get the edat file and only have the .txt . Luckily you can generate the edat file from the .txt file.

Here's how I would approach this issue:

  1. If you do not have the edat files available first use E-DataAid to recover the files.

  2. Then presuming you have multiple participants you can use E-Merge to merge all of the edat files together for all participants in who completed this task.

  3. Open the merged file. It might look a little chaotic depending on how much you have in the file. You can got to Go to tools->Arrange columns . This will show a list of all your variables.

  4. Adjust so that only the desired variables are in the right hand box. Hit ok.

  5. Then you should have something resembling your end goal which can be exported as a csv.

If you have many procedures in the program you might at this point have lines that just have startup info and NULL in the locations where your variables or interest are. You can fix this by going to tools->filter and creating a filter to eliminate those lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM