简体   繁体   English

读取文件并将其解析为部分

[英]reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this: 好的,所以我有一个包含ID号的文件,其名称后面跟着这样的名称:

10 alex de souza 10亚历克斯·德·苏扎

11 robin van persie 11罗宾·范佩西

9 serhat akin 9种类似

I need to read this file and break each record up into 2 fields the id, and the name. 我需要阅读此文件,并将每条记录分成2个字段,分别是ID和名称。 I need to store the entries in a dictionary where ID is the key and the name is the satellite data. 我需要将条目存储在字典中,其中ID是键,名称是卫星数据。 Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. 然后,我需要在2列中每行输出一个条目,字典中的所有条目都按ID(以数字方式)排序。 dict.keys and list.sort might be helpful (I guess). dict.keys和list.sort可能会有所帮助(我想)。 Finally the input filename needs to be the first command-line argument. 最后,输入文件名必须是第一个命令行参数。

Thanks for your help! 谢谢你的帮助!

I have this so far however can't go any further. 到目前为止,我已经做到了,但是不能再进一步了。

fin = open("ids","r")    #Read the file

for line in fin:           #Split lines 

string = str.split()

if len(string) > 1:           #Seperate names and grades

id = map(int, string[0]

name = string[1:]

print(id, name) #Print results

We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list). 我们需要sys.argv来获取命令行参数(注意,脚本的名称始终是返回列表的第0个元素)。

Now we open the file (no error handling, you should add that) and read in the lines individually. 现在,我们打开文件(没有错误处理,应该添加该文件)并逐行读取。 Now we have 'number firstname secondname'-strings for each line in the list "lines". 现在,“行”列表中的每一行都有“ number firstname secondname”字符串。

Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')). 然后打开一个空字典,并逐行循环遍历各个字符串,将每个空格分开,并将它们存储在临时变量tmp(现在是字符串列表:('number','firstname','secondname')中) 。 Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value. 接下来,我们只填充字典,使用数字作为键,并使用空格连接的其余名称作为值。

To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. 要打印已排序的字典,只需使用key = int选项进行数字排序即可循环遍历sorted(out)返回的数字列表。 Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id. 然后通过使用ID的字符串表示形式调用字典来打印ID(数字)和相应的值。

import sys

try:
    infile = sys.argv[1]
except IndexError:
    infile = input('Enter file name: ')

with open(infile, 'r') as file:
    lines = file.readlines()

out = {}  
for fullstr in lines:
    tmp = fullstr.split()
    out[tmp[0]] = ' '.join(tmp[1:])

for id in sorted(out, key=int):
    print id, out[str(id)]

This works for python 2.7 with ASCII-strings. 这适用于带ASCII字符串的python 2.7。 I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. 我非常确定它也应该能够处理其他编码(至少是德国Umlaute可以工作),但是我无法对其进行进一步的测试。 You may also want to add a lot of error handling in case the input file is somehow formatted differently. 如果输入文件的格式有所不同,您可能还想添加很多错误处理。

Just a suggestion, this code is probably simpler than the other code posted: 只是一个建议,此代码可能比发布的其他代码更简单:

import sys
with open(sys.argv[1], "r") as handle:
    lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])

for idx in sorted(data, key=int):
    print idx, data[idx]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM