[英]Remove specific items (0 values and values multiplied by *0) from a large text file and write it to a new text file using Python
I am a basic Python user and I have searched in multiple platforms how to delete from a large text file specific values but I haven't found anything similar to what I want to do .我是一个基本的 Python 用户,我在多个平台上搜索过如何从大型文本文件中删除特定值,但我没有找到与我想要做的类似的任何内容。 I have a large file (out.txt) and I want to remove all the 0 values and all values multiplied by 0 (75*0) in the large data file.
我有一个大文件 (out.txt),我想删除大数据文件中的所有 0 值和所有乘以 0 (75*0) 的值。 After removing all those values I want to write it in a new text file (out2.txt).
删除所有这些值后,我想将其写入新的文本文件 (out2.txt)。 Suggestions please.
请提出建议。 Thanks!
谢谢!
I have tried this code;我试过这段代码;
content = open('out.txt', 'r').readlines()
content_set = set(content)
cleandata = open('clean.txt', 'w')
for line in content_set:
cleandata.remove(0)
I keep getting this error:我不断收到此错误:
cleandata.remove(0)
AttributeError: '_io.TextIOWrapper' object has no attribute 'remove'
DATA FILE out.txt数据文件输出.txt
75*0 78.8502 45.9301 13358*0 10.7678 0 23.9901 43.8503 77*0 1.3757 36.9888 15.0398 76*0 8.19519 0 4.11938 21.4933 23.832 76*0 34.7566
15.5595 21.0239 0 47.1607 76*0 14.9065 52.916 51.7825 13358*0 62.4689 22.8217 15.68 77*0 12.8943 0 32.1276 14.1273 76*0 39.6095
70.8503 72.8765 45.7607 76*0 12.5657 72.7567 58.0161 30.9 76*0 19.5879 648.696 111.501 13358*0 17.36 18.0555 85.0358 77*0 4.62265
55.7498 61.2049 76*0 762.354 8.34207 23.2367 16.0517 76*0 405.637 20.1265 8.17844 16.4698 76*0 107.228 35.1968 38.4117 13358*0
Try this:尝试这个:
with open('out.txt') as f:
s=f.read()
s=' '.join([i for i in s.split(' ') if i!='0' and '*0' not in i])
with open('out2.txt', 'w') as f:
f.write(s)
Output:输出:
78.8502 45.9301 10.7678 23.9901 43.8503 1.3757 36.9888 15.0398 8.19519 4.11938 21.4933 23.832 34.7566
15.5595 21.0239 47.1607 14.9065 52.916 51.7825 62.4689 22.8217 15.68 12.8943 32.1276 14.1273 39.6095
70.8503 72.8765 45.7607 12.5657 72.7567 58.0161 30.9 19.5879 648.696 111.501 17.36 18.0555 85.0358 4.62265
55.7498 61.2049 762.354 8.34207 23.2367 16.0517 405.637 20.1265 8.17844 16.4698 107.228 35.1968 38.4117
This is should work:这应该工作:
content = open('out.txt', 'r').readlines()
cleandata = []
for line in content:
line = {i:None for i in line.replace("\n", "").split()}
for value in line.copy():
if value == "0" or value.endswith("*0"):
line.pop(value)
cleandata.append(" ".join(line) + "\n")
open('clean.txt', 'w').writelines(cleandata)
content = open("out.txt").read()
segments = content.split()
for segment in range(len(segments)):
if segments[segment]=="0" or segments[segment].endswith("*0"):
del segments[segment]
clean = open("clean.txt", "w")
clean.write(" ".join(segments))
clean.close()
What this does is take all of the content of out.txt
and split()
it on all whitespace (no argument means all whitespace).这样做是在所有空格上获取
out.txt
和split()
所有内容(没有参数意味着所有空格)。 Then it loops over each segment and on each segment, checking if the segment is 0
or contains *0
, and if it has either, deletes the segment from segments
.然后它遍历每个段和每个段,检查段是否为
0
或包含*0
,如果有,则从segments
删除segments
。 At the end, it creates clean.txt
, writes all of the segments with spaces separating them, and then closes clean.txt
.最后,它创建
clean.txt
,写入所有段并用空格分隔它们,然后关闭clean.txt
。
The only problem with this that I noticed is that when it writes to clean.txt
, they are separated by spaces instead of their original whitespace.我注意到的唯一问题是,当它写入
clean.txt
,它们由空格而不是原始空格分隔。 One way to fix this is to store the whitespace after each number and when it contains 0
or includes *0
, destroy the segment and it's associated whitespace.解决此问题的一种方法是在每个数字之后存储空格,当它包含
0
或包含*0
,销毁该段及其关联的空格。
Try it and tell me in the comments if it works!试试看,如果有效,请在评论中告诉我!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.