[英]Counting how many words each line has in at text file with Python (using str.split)
I have two files, one for input which is "our_input.txt" (same directory as the code file), which is Oscar Wild's Dorian Gray.我有两个文件,一个用于输入,它是“our_input.txt”(与代码文件相同的目录),即 Oscar Wild 的 Dorian Gray。 Anyway, there's also an output file, and I need to open the original file, and let Python count how many words each line has, and write in the output.不管怎样,还有一个输出文件,我需要打开原始文件,让Python统计每行有多少字,并写在输出中。
I tried, but I got lost...我试过了,但我迷路了......
You can try something like this.你可以尝试这样的事情。
First you read your input file:首先你阅读你的输入文件:
with open('our_input.txt') as f:
lines = f.readlines()
Then you count the number of words per line and write to the output file:然后计算每行的单词数并写入输出文件:
with open('our_output.txt', 'w') as f:
for index, value in enumerate(lines):
number_of_words = len(value.split())
f.write('Line number {} has {} words.\n'.format(index + 1, number_of_words))
You will need to to iterate over each line of the input text file.您需要遍历输入文本文件的每一行。 That's done with a standard for loop.这是通过标准的 for 循环完成的。 You can after split each line at each space char, and count with len() the number of elements in the list.您可以在每个空格字符处拆分每一行,并使用 len() 计算列表中元素的数量。 You append this to the output file and you are done您将其附加到输出文件中,您就完成了
A simple technique in any language for word counting in files is:任何语言中用于文件中字数统计的简单技术是:
We now have a string with words separated by single spaces.我们现在有一个由单个空格分隔的单词的字符串。
Now either现在要么
or或
The size of the file being worked upon could make this a weighty job for the processor and performance will depend on how the language handles strings and arrays.正在处理的文件的大小可能使处理器成为一项繁重的工作,性能将取决于语言处理字符串和数组的方式。
If you are working client-server or the text is stored in a database consider the high network cost of moving the string.如果您正在使用客户端 - 服务器或文本存储在数据库中,请考虑移动字符串的高网络成本。 Better to run the count as close to the data location as possible.最好在尽可能靠近数据位置的地方运行计数。 So if using an RDBMS use a stored procedure - faster to count words in a 2Gb string and ship an int variable with the answer out to the client than to ship the 2Gb string and count in a web browser.因此,如果使用 RDBMS 使用存储过程 - 计算 2Gb 字符串中的单词并将带有答案的 int 变量发送给客户端比发送 2Gb 字符串并在 Web 浏览器中计数更快。
If you cannot read the entire file in one pass then you can read line-by-line and carry out the above techniques per line.如果您不能一次读取整个文件,那么您可以逐行读取并每行执行上述技术。 However, due to string handling and loop-running overhead, performance will be faster if you can process the entire file as one string.但是,由于字符串处理和循环运行开销,如果您可以将整个文件作为一个字符串处理,则性能会更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.