简体   繁体   English

从文件中读取单词,维护顺序

[英]reading words from a file into a set, maintaining order

here is an array of Unicode words used in the python script. 这是python脚本中使用的Unicode字数组。

texts =[u"abc", u"pqr", u"mnp"]

The script is working as expected with the above 3 words example. 该脚本正如预期的那样使用上述3个单词示例。 The issue is that there are thousands of words in a text file. 问题是文本文件中有数千个单词。 How do I read from the text file? 我如何从文本文件中读取?

Update: I have 2 issues. 更新:我有2个问题。 The sequence of words from the text file is not maintained in the output. 文本文件中的单词序列不会保留在输出中。 The text file has unicode characters and hence the "u" in my original example. 文本文件具有unicode字符,因此在我的原始示例中为“u”。

# cat testfile.txt
Testing this file with Python

# cat test.py
#!/usr/bin/python
# -*- coding: utf-8 -*-

f     = open('testfile.txt', 'r')
texts  = set(f.read().split())
print (texts)

# python test.py
set(['this', 'Python', 'Testing', 'with', 'file'])

This is because how sets work. 这是因为如何设置工作。 They don't maintain the order of the items stored in the set. 它们不维护存储在集合中的项目的顺序。

From the documentation : 文档

A set object is an unordered collection of distinct hashable objects set对象是不同的可哈希对象的无序集合

I see no problem with your file reading code. 我认为您的文件读取代码没有问题。 Given that the words appear in the file separated by whitespace, and the file is not too big to be gulped with a single read , it should work just fine. 鉴于单词出现在由空格分隔的文件中,并且文件不是太大而无法通过单个read吞咽,它应该可以正常工作。 The real problem is the order of the words if you shove them into a set . 真正的问题是如果你把它们推到一个set中的话的顺序。

If you need the words in the same order as they appear in the file, why are you using a set ? 如果您需要与文件中显示的顺序相同的单词,为什么使用set Just keep them in a list. 只需将它们保存在列表中即可。

If you need a set to remove duplicates and/or other purposes, then you have the following options: 如果您需要一个set来删除重复项和/或其他目的,那么您有以下选项:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM