[英]Python program to extract text from a text file?
I have a text file which I obtained from converting a .srt file. 我有一个文本文件,是从转换.srt文件获得的。 The content is as follows:
内容如下:
1 0:0:1,65 --> 0:0:7,85 Hello, my name is Gareth, and in this video, I'm going to talk about list comprehensions 2 0:0:7,85 --> 0:0:9,749 in Python.
I want only the words present the text file such that the output is a new textfile op.txt, with the output represented as: 我只希望单词出现在文本文件中,以便输出是一个新的文本文件op.txt,输出表示为:
Hello my name is Gareth and
and so on. 等等。
This is the program I'm working on: 这是我正在研究的程序:
import os, re
f= open("D:\captionsfile.txt",'r')
k=f.read()
g=str(k)
f.close()
w=re.search('[a-z][A-Z]\s',g)
fil=open('D:\op.txt','w+')
fil.append(w)
fil.close()
But the output I get for this program is: 但是我在该程序中得到的输出是:
None None None
If we assume m
is a word and short for am
and that in.txt
is your textfile, you can use 如果我们假设
m
是一个单词,是am
缩写,而in.txt
是您的文本文件,则可以使用
import re
with open('in.txt') as intxt:
data = intxt.read()
x = re.findall('[aA-zZ]+', data)
print(x)
which will produce 会产生
['Hello', 'my', 'name', 'is', 'Gareth', 'and', 'in', 'this', 'video', 'I', 'm', 'going', 'to', 'talk', 'about', 'list', 'comprehensions', 'in', 'Python']
You can now write x
to a new file with: 您现在可以使用以下命令将
x
写入新文件:
with open('out.txt', 'w') as outtxt:
outtxt.write('\n'.join(x))
To get 要得到
I'm
instead of 代替
I
m
you can use re.findall('[aA-zZ\\']+')
您可以使用
re.findall('[aA-zZ\\']+')
with open("out.txt","a") as f1:
with open("b.txt") as f:
for line in f:
if not line[0].isdigit():
for word in line.split():
f1.write(re.sub(r'[,.!]', "", word)) # replace any punctuation you don't want
f1.write("\n")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.