简体   繁体   中英

Convert text file to list of tuples

I am having trouble converting a text file in to a list of tuples. The text file will be in the following format and the file is named data.txt

Evans Lee Comedian 25,000
Smith Will Actor 50,000
Mack Lee Comedian 30,000

I have managed to partially achieve my aim by using the following code

load_file = open("data.txt", "r")
data = infile.read()
load_file.close()
data = tuple(item for item in data.split(' ') if item.strip())
print (data)

However this does not achieve what i wanted it to, it produces the following output

('Evans', 'Lee', 'Comedian', '25,000\nSmith', 'Will', 'Actor', '50,000\nMack', 'Lee', 'Comedian', '30,000')

This is just one big long tuple with the newline character included, is there a way I can make it so that each line in the data.txt file is it's own tuple, giving me a list of tuples and get rid of the newline character?

If you read() the entire file into one big string, you have to splitlines() first:

data = [tuple(line.split()) for line in data.splitlines()]

Better: Don't read the file as one big string but iterate lines in the file directly:

with open("data.txt") as load_file:
    data = [tuple(line.split()) for line in load_file]

Still better: Use csv to read the data; this will also allow you to eg put names or job titles that contain spaces into quotes:

import csv
with open("data.txt") as load_file:
    reader = csv.reader(load_file, delimiter=" ")
    data = [tuple(row) for row in reader]

You're very close. infile.read() gives you the entire file as a string, which includes newline characters ( \n ). You could use .readlines() instead.

with open('data.txt') as f:
    lines = f.readlines()
    return tuple(tuple(line.split()) for line in lines)

This should give

(
  ("Evans", "Lee", "Comedian", "25,000"),
  ("Smith", "Will", "Actor", "50,000"),
  ...
)

You want readlines() , which reads each line from the input file into a separate item in a list:

load_file = open("data.txt", "r")
raw = infile.readlines()
data = [line.strip() for line in raw]
load_file.close()
print (data)

Note this will strip all whitespace from the ends of each line, not just your newline. That's probably OK, I'm guessing.

The problem here is how to parse your file. In general, files are comma separated values, where we have a single character to split the fields https://pt.wikipedia.org/wiki/Comma-separated_values Or we have a file where columns have a fixed length https://en.wikipedia.org/wiki/Flat-file_database

In your case, I would use a regular expression to parse your line.

Try this:

import re

with open("data.txt", "r") as infile:
    for line in infile:
        groups = re.search("^(.+) (\d.*)", line)
        name = groups.group(1)
        value = groups.group(2)
        print("Name: %s Value:%s" % (name, value))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM