简体   繁体   English

将文本文件转换为元组列表

[英]Convert text file to list of tuples

I am having trouble converting a text file in to a list of tuples.我无法将文本文件转换为元组列表。 The text file will be in the following format and the file is named data.txt文本文件将采用以下格式,文件名为 data.txt

Evans Lee Comedian 25,000埃文斯·李 喜剧演员 25,000
Smith Will Actor 50,000史密斯威尔 演员 50,000
Mack Lee Comedian 30,000麦克李喜剧演员 30,000

I have managed to partially achieve my aim by using the following code通过使用以下代码,我设法部分实现了我的目标

load_file = open("data.txt", "r")
data = infile.read()
load_file.close()
data = tuple(item for item in data.split(' ') if item.strip())
print (data)

However this does not achieve what i wanted it to, it produces the following output但是,这并没有达到我想要的效果,它会产生以下 output

('Evans', 'Lee', 'Comedian', '25,000\nSmith', 'Will', 'Actor', '50,000\nMack', 'Lee', 'Comedian', '30,000')

This is just one big long tuple with the newline character included, is there a way I can make it so that each line in the data.txt file is it's own tuple, giving me a list of tuples and get rid of the newline character?这只是一个包含换行符的大长元组,有没有办法让data.txt文件中的每一行都是它自己的元组,给我一个元组列表并去掉换行符?

If you read() the entire file into one big string, you have to splitlines() first:如果你read()整个文件成一个大字符串,你必须先splitlines()

data = [tuple(line.split()) for line in data.splitlines()]

Better: Don't read the file as one big string but iterate lines in the file directly:更好:不要将文件作为一个大字符串读取,而是直接迭代文件中的行:

with open("data.txt") as load_file:
    data = [tuple(line.split()) for line in load_file]

Still better: Use csv to read the data;更好的是:使用csv读取数据; this will also allow you to eg put names or job titles that contain spaces into quotes:这也将允许您例如将包含空格的名称或职位放在引号中:

import csv
with open("data.txt") as load_file:
    reader = csv.reader(load_file, delimiter=" ")
    data = [tuple(row) for row in reader]

You're very close.你很亲密。 infile.read() gives you the entire file as a string, which includes newline characters ( \n ). infile.read()将整个文件作为字符串提供给您,其中包括换行符 ( \n )。 You could use .readlines() instead.您可以改用.readlines()

with open('data.txt') as f:
    lines = f.readlines()
    return tuple(tuple(line.split()) for line in lines)

This should give这应该给

(
  ("Evans", "Lee", "Comedian", "25,000"),
  ("Smith", "Will", "Actor", "50,000"),
  ...
)

You want readlines() , which reads each line from the input file into a separate item in a list:您需要readlines() ,它将输入文件中的每一行读入列表中的单独项目:

load_file = open("data.txt", "r")
raw = infile.readlines()
data = [line.strip() for line in raw]
load_file.close()
print (data)

Note this will strip all whitespace from the ends of each line, not just your newline.请注意,这将删除每行末尾的所有空格,而不仅仅是换行符。 That's probably OK, I'm guessing.应该没问题吧,我猜。

The problem here is how to parse your file.这里的问题是如何解析你的文件。 In general, files are comma separated values, where we have a single character to split the fields https://pt.wikipedia.org/wiki/Comma-separated_values Or we have a file where columns have a fixed length https://en.wikipedia.org/wiki/Flat-file_database通常,文件是逗号分隔值,我们有一个字符来分割字段https://pt.wikipedia.org/wiki/Comma-separated_values或者我们有一个文件,其中列具有固定长度https://en .wikipedia.org/wiki/Flat-file_database

In your case, I would use a regular expression to parse your line.在您的情况下,我会使用正则表达式来解析您的行。

Try this:尝试这个:

import re

with open("data.txt", "r") as infile:
    for line in infile:
        groups = re.search("^(.+) (\d.*)", line)
        name = groups.group(1)
        value = groups.group(2)
        print("Name: %s Value:%s" % (name, value))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM