[英]How do I format input from a text file into a defaultdict in python
具有這種格式的文本文件有超過 50K 行
M:org.apache.mahout.common.RandomUtilsTest:testHashDouble():['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloat():['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction():['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction2():['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
如何讀取這些數據並將其格式化為字典,以便 [] 中的所有方法都是單獨的值,而 [ (測試方法)之前的字符串是鍵? 在將它們作為值存儲在字典中之前,我將如何刪除它們? #Python
這是用於填充文本文件的代碼。 現在我正在嘗試獲取該 txt 文件數據並將其讀/解析回另一個字典。
d = {}
with open("filtered.txt") as input:
for line in input:
(key, val) = line.strip().split(" ")
if str(key) in d:
d[str(key)].append(val)
else:
d[str(key)] = [val]
keys = []
for key in d:
keys.append(key)
keys.sort()
input.close()
with open('mahout-coverage.txt', 'w') as outfile:
for key in keys:
outfile.writelines('{}:{}'.format(key, d[key]) + "\n")
json 模塊可用於將 python 字典存儲到文件中,然后加載文件並在寫入文件之前將其解析為相同的數據類型。
d = {}
with open('filtered.txt') as input:
for line in input:
key, value = line.strip().split("():")
key = "{}()".format(key)
d[key] = value
print(d)
# It would be better and easy if you write the data to the file using json module
import json
with open('data.txt', 'w') as json_file:
json.dump(d, json_file)
# Later you can read the file using the json module itself
with open('data.txt') as f:
# this data would be a dicitonay which can be easily managed.
data = json.load(f)
使用ast.literal_eval您可以將字符串列表轉換為list
from collections import defaultdict
import ast
with open('tst.txt') as fp:
d = defaultdict(list)
for line in fp:
k, v = line[: line.index('):') + 1], ast.literal_eval(line[line.index(':[') + 1:])
d[k] += v
print(dict(d))
Output:
{
M:org.apache.mahout.common.RandomUtilsTest:testHashDoubl : ['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloa : ['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunctio : ['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction : ['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.