Python程序从文本文件中提取文本？

Question

I have a text file which I obtained from converting a .srt file. 我有一个文本文件，是从转换.srt文件获得的。 The content is as follows: 内容如下：

1
0:0:1,65 --> 0:0:7,85
Hello, my name is Gareth, and in this
video, I'm going to talk about list comprehensions


2
0:0:7,85 --> 0:0:9,749
in Python.

I want only the words present the text file such that the output is a new textfile op.txt, with the output represented as: 我只希望单词出现在文本文件中，以便输出是一个新的文本文件op.txt，输出表示为：

Hello
my
name 
is
Gareth
and

and so on. 等等。

This is the program I'm working on: 这是我正在研究的程序：

import os, re
f= open("D:\captionsfile.txt",'r')
k=f.read()
g=str(k)
f.close()
w=re.search('[a-z][A-Z]\s',g)
fil=open('D:\op.txt','w+')
fil.append(w)
fil.close()

But the output I get for this program is: 但是我在该程序中得到的输出是：

None
None
None

Answer 1

If we assume m is a word and short for am and that in.txt is your textfile, you can use 如果我们假设m是一个单词，是am缩写，而in.txt是您的文本文件，则可以使用

import re

with open('in.txt') as intxt:
    data = intxt.read()

x = re.findall('[aA-zZ]+', data)
print(x)

which will produce 会产生

['Hello', 'my', 'name', 'is', 'Gareth', 'and', 'in', 'this', 'video', 'I', 'm', 'going', 'to', 'talk', 'about', 'list', 'comprehensions', 'in', 'Python']

You can now write x to a new file with: 您现在可以使用以下命令将x写入新文件：

with open('out.txt', 'w') as outtxt:
    outtxt.write('\n'.join(x))

To get 要得到

I'm

instead of 代替

I
m

you can use re.findall('[aA-zZ\\']+') 您可以使用re.findall('[aA-zZ\\']+')

Answer 2

with open("out.txt","a") as f1:
    with open("b.txt")  as f:
        for line in f:
            if not line[0].isdigit():
                for word in line.split():
                    f1.write(re.sub(r'[,.!]', "", word)) # replace any punctuation you don't want
                    f1.write("\n")

Python程序从文本文件中提取文本？

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-05-31 09:38:39

解决方案2
1 2014-05-31 09:50:42

Python程序从文本文件中提取文本？

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-05-31 09:38:39

解决方案2 1 2014-05-31 09:50:42

解决方案1
1 已采纳 2014-05-31 09:38:39

解决方案2
1 2014-05-31 09:50:42