簡體   English   中英

如何將文本文件中的輸入格式化為 python 中的 defaultdict

[英]How do I format input from a text file into a defaultdict in python

具有這種格式的文本文件有超過 50K 行

M:org.apache.mahout.common.RandomUtilsTest:testHashDouble():['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloat():['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction():['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction2():['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']

如何讀取這些數據並將其格式化為字典,以便 [] 中的所有方法都是單獨的值,而 [ (測試方法)之前的字符串是鍵? 在將它們作為值存儲在字典中之前,我將如何刪除它們? #Python

這是用於填充文本文件的代碼。 現在我正在嘗試獲取該 txt 文件數據並將其讀/解析回另一個字典。

    d = {}
    with open("filtered.txt") as input:
        for line in input:
            (key, val) = line.strip().split(" ")
            if str(key) in d:
                d[str(key)].append(val)
            else:
                d[str(key)] = [val]

    keys = []
    for key in d:
        keys.append(key)

    keys.sort()

    input.close()

    with open('mahout-coverage.txt', 'w') as outfile:
        for key in keys:
            outfile.writelines('{}:{}'.format(key, d[key]) + "\n")

json 模塊可用於將 python 字典存儲到文件中,然后加載文件並在寫入文件之前將其解析為相同的數據類型。

d = {}
with open('filtered.txt') as input:
    for line in input:
        key, value = line.strip().split("():")
        key = "{}()".format(key)
        d[key] = value

print(d)

# It would be better and easy if you write the data to the file using json module
import json

with open('data.txt', 'w') as json_file:
  json.dump(d, json_file)

# Later you can read the file using the json module itself
with open('data.txt') as f:
  # this data would be a dicitonay which can be easily managed.
  data = json.load(f)

參考: json.dump()json.load()

使用ast.literal_eval您可以將字符串列表轉換為list

from collections import defaultdict
import ast
with open('tst.txt') as fp:
    d = defaultdict(list)
    for line in fp:
        k, v = line[: line.index('):') + 1], ast.literal_eval(line[line.index(':[') + 1:])
        d[k] += v
print(dict(d))

Output:

{
M:org.apache.mahout.common.RandomUtilsTest:testHashDoubl :  ['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloa :  ['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunctio :  ['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction :  ['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM