简体   繁体   English

Python:读取文件并向不同行的字典添加键和值

[英]Python: Reading a file and adding keys and values to dictionaries from different lines

I'm very new to Python and I'm having trouble working on an assignment which basically is like this:我对 Python 非常陌生,在完成一项基本上是这样的作业时遇到了麻烦:

#Read line by line a WARC file to identify string1. # 逐行读取 WARC 文件以识别 string1。

#When string1 found, add part of the string as a key to a dictionary. #当找到 string1 时,将字符串的一部分作为键添加到字典中。

#Then continue reading file to identify string2, and add part of string2 as a value to the previous key. #然后继续读文件识别string2,将string2的一部分作为值加到前面的key上。

#Keep going through file and doing the same to build the dictionary. #继续浏览文件并做同样的事情来构建字典。

I can't import anything so it's causing me a bit of trouble, especially adding the key, then leaving the value empty and continue going through the file to find string2 to be used as value.我无法导入任何东西,所以这给我带来了一些麻烦,尤其是添加键,然后将值留空并继续浏览文件以找到要用作值的 string2。

I've started thinking something like saving the key to an intermediate variable, then going on to identify the value, add to an intermediate variable and finally build the dictionary.我开始考虑将键保存到中间变量,然后继续识别值,添加到中间变量,最后构建字典。

def main ():
###open the file
file = open("warc_file.warc", "rb")
filetxt = file.read().decode('ascii','ignore')
filedata = filetxt.split("\r\n")
dictionary = dict()
while line in filedata:
    for line in filedata:
        if "WARC-Type: response" in line:
            break
    for line in filedata:
        if "WARC-Target-URI: " in line:
           urlkey = line.strip("WARC-Target-URI: ")

It's not entirely clear what you're trying to do, but I'll have a go at answering.目前尚不清楚您要做什么,但我会尝试回答。

Suppose you have a WARC file like this:假设你有一个像这样的 WARC 文件:

WARC-Type: response
WARC-Target-URI: http://example.example
something
WARC-IP-Address: 88.88.88.88

WARC-Type: response
WARC-Target-URI: http://example2.example2
something else
WARC-IP-Address: 99.99.99.99

Then you could create a dictionary that maps the target URIs to the IP addresses like this:然后,您可以创建一个字典,将目标 URI 映射到 IP 地址,如下所示:

dictionary = dict()

with open("warc_file.warc", "rb") as file:
  urlkey = None
  value = None

  for line in file:
    if b"WARC-Target-URI: " in line:
      assert urlkey is None
      urlkey = line.strip(b"WARC-Target-URI: ").rstrip(b"\n").decode("ascii")

    if b"WARC-IP-Address: " in line:
      assert urlkey is not None
      assert value is None

      value = line.strip(b"WARC-IP-Address: ").rstrip(b"\n").decode("ascii")

      dictionary[urlkey] = value

      urlkey = None
      value = None

print(dictionary)

This prints the following result:这将打印以下结果:

{'http://example.example': '88.88.88.88', 'http://example2.example2': '99.99.99.99'}

Note that this approach only loads one line of the file into memory at a time, which might be significant if the file is very large.请注意,此方法一次仅将文件的一行加载到内存中,如果文件非常大,这可能很重要。

Your idea with storing the key to an intermediate value is good.您将密钥存储到中间值的想法很好。

I also suggest using the following snippet to iterate over the lines.我还建议使用以下代码段迭代这些行。

with open(filename, "rb") as file:
    lines = file.readlines()
    for line in lines: 
        print(line)

To create dictionary entries in Python, the dict.update() method can be used.要在 Python 中创建字典条目,可以使用dict.update()方法。 It allows you to create new keys or update values if the key already exists.如果键已经存在,它允许您创建新键或更新值。

d = dict() # create empty dict
d.update({"key" : None}) # create entry without value
d.update({"key" : 123}) # update the value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM