简体   繁体   English

使用 Python 从文本文件中提取信息

[英]Extracting information from a text file using Python

I have the below text file with information that looks like this:我有以下文本文件,其中包含如下信息:

# found importantstuffhere
found request could not find identifier. Please check the name and try again.

I also have line that look like this:

# found importantstuffhere
finding (identifier here) with blah blah blah.

I want to write a python code that will go throw the the text file and extract我想编写一个 python 代码,它将 go 抛出文本文件并提取

A. the first example is when the search failed, so I want to extract the 'importantstuffhere' and the phrase 'found request could not find identifier'. A. 第一个例子是搜索失败时,所以我想提取'importantstuffhere'和短语'found request could not find identifier'。

B. when it worked, as shown in second line, I want to extract 'importantstuffhere' and the phrase 'finding (identifier here)' B. 当它起作用时,如第二行所示,我想提取“importantstuffhere”和短语“finding (identifier here)”

Is this possible with python and if so how? python 是否有可能,如果可以,怎么办?

Bonus point:奖励点:

can I have the extracted values be placed in columns in a csv or excel file.我可以将提取的值放在 csv 或 excel 文件的列中吗? such as

column A column B A列 B列

importantstuffhere - and then for column B it would say either it found request could not find identifier or it would say finding (identifier here). importantstuffhere - 然后对于 B 列,它会说它找到请求找不到标识符,或者它会说正在查找(此处的标识符)。

Thank you for your time!感谢您的时间!

Note: the # in the text file are part of the text file, I did not write them here just for clarification.注意:文本文件中的#是文本文件的一部分,我没有写在这里只是为了澄清。

Essentially, extract the values needed, add them to a list so that I can later make them columns in a dataframe.本质上,提取所需的值,将它们添加到列表中,以便我以后可以将它们列在 dataframe 中。 perhaps list one has importantstuffhere and list 2 has the results也许清单一有重要的东西,清单二有结果

script.py:脚本.py:

f = open('sampletext.txt', 'r')
lines = f.readlines()

important_stuff = []

{'line_number': None, 'line_text': ''}

for line_number, text in enumerate(lines):
    if text.find('found request could not find identifier') != -1:
        important_stuff.append({'line_number': line_number, 'line_text': text})

print(important_stuff)

The following will read a file, gather the lines into one string, and write them to a csv separated by commas:下面将读取一个文件,将这些行收集到一个字符串中,并将它们写入一个用逗号分隔的 csv:

f = open('sampletext.txt', 'r')
lines = f.readlines()

text_seperated_by_comma = ", ".join(lines)
text_without_line_breaks = text_seperated_by_comma.strip('\n')

with open('fileName.csv', 'w') as csv_file:
    f = csv_file.write(text_without_line_breaks)

To check for a string then write the next line to csv file I have this:要检查字符串,然后将下一行写入 csv 文件,我有这个:

f = open('sampletext.txt', 'r')
lines = f.readlines()

csv_lines_to_write = []

SEARCH_TEXT = 'importantstuffhere'

for line_number, text in enumerate(lines):
    if text.find(SEARCH_TEXT) != -1:
        next_line_index = line_number + 1
        next_line_text = lines[next_line_index]
        assert type(SEARCH_TEXT) is str
        assert type(next_line_text) is str
        csv_line_to_write = SEARCH_TEXT, + ', ' + lines[next_line_index]
        csv_lines_to_write.append(csv_line_to_write)

with open('fileName.csv', 'w') as csv_file:
    for line in csv_lines_to_write:
        csv_file.write(text_without_line_breaks)

I'm getting error我收到错误

csv_line_to_write = SEARCH_TEXT, + ', ' + lines[next_line_index]
TypeError: bad operand type for unary +: 'str'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM