简体   繁体   English

使用 Python 将部分 a.txt 文件提取到列表中

[英]Extract parts of a .txt file into a list using Python

I am trying to extract numerical codes from a large text file and put them into a list.我正在尝试从大型文本文件中提取数字代码并将它们放入列表中。 These codes all start with 'gameId': followed by a space and ten numbers (eg ' gameId': 4725591545 )这些代码都以'gameId':后跟一个空格和十个数字(例如' gameId': 4725591545

Thus far this is all I have managed.到目前为止,这就是我所做的一切。

#create a variable for the path
recent_matches_index = "path/to/records/recent_matches_index.txt"

#create an empty list
game_id_list = []

#open the file, read it, recognise the id codes I
#want and append them to the empty list.
#removing the string should be easy enough later on
with open (recent_matches_index, 'rt') as my_data_file:
    for game_id in my_data_file:
        game_id = my_data_file.find("'gameId': **********")
        game_id_list.append(game_id)

#lets see if it worked
print(game_id_list)

This throws the error: AttributeError: '_io.TextIOWrapper' object has no attribute 'find'这会引发错误: AttributeError: '_io.TextIOWrapper' object has no attribute 'find'

I have tried quite a few other things but I feel like they have been very wrong.我已经尝试了很多其他的东西,但我觉得它们非常错误。

Ideally, I think I would also like to scrub the characters out of the list and only have the numerical values.理想情况下,我想我也想将字符从列表中删除,并且只有数值。 These values should remain in the order in which they occur in the file.这些值应保持它们在文件中出现的顺序。

example of what the list might look like:列表可能如下所示的示例:

game_id_list = [8403937582, 8402849381, 9604860905]

Evidently I'm new to python so explicit answers will be greatly appreciated.显然我是 python 的新手,因此将不胜感激明确的答案。

Edit: Here is an example of what is in my.txt file.编辑:这是 my.txt 文件中的示例。

[{'leagueId': '5ae26d6d-c07a-3d29-ae0f-1ac9ca4a4f4a', 'queueType': 'RANKED_SOLO_5x5', 'tier': 'CHALLENGER', 'rank': 'I', 'summonerId': 'Sb5Y-1bZYeIItHXgNDS1U-PI0kgKF6_Wr2ZYBPqv95OKCOA', 'summonerName': 'RGE Inspired2', 'leaguePoints': 1556, 'wins': 254, 'losses': 186, 'veteran': True, 'inactive': False, 'freshBlood': False, 'hotStreak': False}, {'leagueId': '92edd520-fcf0-4e6b-b767-e3938a408d66', 'queueType': 'RANKED_FLEX_SR', 'tier': 'PLATINUM', 'rank': 'IV', 'summonerId': 'Sb5Y-1bZYeIItHXgNDS1U-PI0kgKF6_Wr2ZYBPqv95OKCOA', 'summonerName': 'RGE Inspired2', 'leaguePoints': 100, 'wins': 11, 'losses': 2, 'veteran': False, 'inactive': False, 'freshBlood': False, 'hotStreak': False, 'miniSeries': {'target': 2, 'wins': 0, 'losses': 0, 'progress': 'NNN'}}]
{'matches': [{'platformId': 'EUW1', 'gameId': 4728220480, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1595790394190, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4727511119, 'champion': 104, 'queue': 420, 'season': 13, 'timestamp': 1595776780604, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4727366101, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1595774913709, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4724700944, 'champion': 104, 'queue': 420, 'season': 13, 'timestamp': 1595632344294, 'role': 'DUO_SUPPORT', 'lane': 'NONE'}, {'platformId': 'EUW1', 'gameId': 4724662512, 'champion': 163, 'queue': 420, 'season': 13, 'timestamp': 1595629227345, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4724497303, 'champion': 203, 'queue': 420, 'season': 13, 'timestamp': 1595627065458, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4722652071, 'champion': 104, 'queue': 420, 'season': 13, 'timestamp': 1595530253160, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4709654115, 'champion': 875, 'queue': 420, 'season': 13, 'timestamp': 1594854382037, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4709546735, 'champion': 113, 'queue': 420, 'season': 13, 'timestamp': 1594850792558, 'role': 'DUO_SUPPORT', 'lane': 'TOP'}, {'platformId': 'EUW1', 'gameId': 4706883023, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1594723475884, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4703718275, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1594551535462, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4703411360, 'champion': 245, 'queue': 420, 'season': 13, 'timestamp': 1594509687989, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4703403465, 'champion': 245, 'queue': 420, 'season': 13, 'timestamp': 1594506333259, 'role': 'NONE', 'lane': 'JUNGLE'},

Okay as you did not provide a file example at first I was thinking of a txt file so this was my first solution:好的,因为您一开始没有提供文件示例,所以我在考虑一个 txt 文件,所以这是我的第一个解决方案:

txt = """
Lorem ipsum 'gameId': 4725591545 Lorem ipsum dolor sit amet
,  'gameId': 5725591546 consetetur sadipscing elitr,
'gameId': 6725591547 sed diam
"""

output = []
start_pos = txt.find("'gameId': ") + len("'gameId': ")

if txt.find("'gameId': ") != -1:
    while True:
        output.append(txt[start_pos:start_pos + 10])
        if txt[start_pos:].find("'gameId': ") != -1:
            start_pos += txt[start_pos:].find("'gameId': ") + len("'gameId': ")
        else:
            break

print(output)
# ['4725591545', '5725591546', '6725591547']

But then you edited your question with an example (btw a really bad example cause there are missing brackets and commas) which is not a valid json format because of the ' instead of " . So I had to modify your example a little bit. And this could be a solution:但是随后您用一个示例编辑了您的问题(顺便说一句,这是一个非常糟糕的示例,因为缺少括号和逗号),这不是有效的 json 格式,因为'而不是" 。所以我不得不稍微修改一下您的示例。并且这可能是一个解决方案:

import json

txt2 = """{'matches': [{'platformId': 'EUW1', 'gameId': 4728220480, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1595790394190, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4727511119, 'champion': 104, 'queue': 420, 'season': 13, 'timestamp': 1595776780604, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4727366101, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1595774913709, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4724700944, 'champion': 104, 'queue': 420, 'season': 13, 'timestamp': 1595632344294, 'role': 'DUO_SUPPORT', 'lane': 'NONE'}, {'platformId': 'EUW1', 'gameId': 4724662512, 'champion': 163, 'queue': 420, 'season': 13, 'timestamp': 1595629227345, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4724497303, 'champion': 203, 'queue': 420, 'season': 13, 'timestamp': 1595627065458, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4722652071, 'champion': 104, 'queue': 420, 'season': 13, 'timestamp': 1595530253160, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4709654115, 'champion': 875, 'queue': 420, 'season': 13, 'timestamp': 1594854382037, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4709546735, 'champion': 113, 'queue': 420, 'season': 13, 'timestamp': 1594850792558, 'role': 'DUO_SUPPORT', 'lane': 'TOP'}, {'platformId': 'EUW1', 'gameId': 4706883023, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1594723475884, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4703718275, 'champion': 121, 'queue': 420, 'season': 13, 'timestamp': 1594551535462, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4703411360, 'champion': 245, 'queue': 420, 'season': 13, 'timestamp': 1594509687989, 'role': 'NONE', 'lane': 'JUNGLE'}, {'platformId': 'EUW1', 'gameId': 4703403465, 'champion': 245, 'queue': 420, 'season': 13, 'timestamp': 1594506333259, 'role': 'NONE', 'lane': 'JUNGLE'}]}"""

txt = json.loads(txt2.replace("'", '"'))
output = [x["gameId"] for x in txt["matches"]]
print(output)
# [4728220480, 4727511119, 4727366101, 4724700944, 4724662512, 4724497303, 4722652071, 4709654115, 4709546735, 4706883023, 4703718275, 4703411360, 4703403465]

Instead of using a string like I did you can use it with a file.您可以将它与文件一起使用,而不是像我那样使用字符串。 If your example is really what was saved in the file, then you may have to tidy up your string at first.如果您的示例确实是保存在文件中的内容,那么您可能必须首先整理您的字符串。

EDIT编辑

As you wrote you put your json content into a file and to use my example with txt2 = Path/To/File this will not work.正如您所写的那样,您将 json 内容放入文件中,并将我的示例与txt2 = Path/To/File一起使用,这将不起作用。 the json modul works with json strings so what you have to do is to make the txt2 variable a json string. json 模块可与 json 字符串一起使用,因此您要做的就是将 txt2 变量设为 json 字符串。 So you first have to open the file and then read the content.因此,您首先必须打开文件,然后读取内容。 Then you have the json string the module can work with.然后你就有了模块可以使用的 json 字符串。 Try this one:试试这个:

with open("path/to/myTestFile", "r") as f:
    txt2 = f.read()

As we open the file with the with statement we dont need to close it later on.当我们使用with语句打开文件时,我们不需要稍后关闭它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM