简体   繁体   English

如何搜索文本文件的文件夹以查看是否存在特定字符串,然后使用Python提取两个单词之间的字符串?

[英]How to search a folder of text files to see if a specific string exists and then extract a string between two words using Python?

Currently, I'm just trying to figure out why this code won't work. 目前,我只是想弄清楚为什么此代码无法正常工作。

EDIT: wrote some new code 编辑:编写了一些新代码

EDIT 2: SOLVED (reading through comments reminded me to encode the with.open() so it looks like this now, 编辑2:已解决(通读注释提醒我对with.open()进行编码,所以现在看起来像这样,

EDIT 3: New code for pulling the "string" (date) I need between two words. 编辑3:我需要两个单词之间拉“字符串”(日期)的新代码。

import os
items=os.listdir("C:/output")


for names in items:
  if names.endswith(".txt"):
    with open('C:/output/' + names, encoding="utf8") as currentFile:
      text = currentFile.read()
      if ('Date Released' in text):
           a = 'Released'
           b = 'Description'
           startpos = text.find(a) + len(a)
           endpos = text.find(b, startpos)
           print('Date Released ' + text[startpos:endpos] + names + '\n')
           #print ('Found in ' + names[:-4] + '\n')
      else:
          print ('Not in ' + names[:-4] + '\n')

I'm now getting the output: 我现在得到输出:

Date Released :  12/14/2016



1393-004IP_ B_ C2  filename

Date Released :  4/11/2017



1476-002 IP, filename

Date Released :  9/25/2015



1987-XXX IP filename

Is there a way to get the Date Released : #/##/#### line on the same line as the file name? 有没有办法将发布日期:#/ ## / ####行与文件名放在同一行? Also, some output I get when I run this right now is garbage. 另外,我现在运行此命令时得到的一些输出是垃圾。 I'm assuming it could be from checking for Date Released more than once or a possible flaw with the if condition? 我假设这可能是因为检查了多次发布日期或if条件的可能缺陷?

如果您使用的是Windows,请尝试在路径名中使用双“ \\”而不是单“ /”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM