[英]Merging methods of two different sets of data in Python
这个问题被编辑了。 请先查看底部的编辑。
这个问题会有点长,所以我提前道歉。 请考虑两种不同类型的数据:
数据一:
{
"files": [
{
"name": "abc",
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
}
]
}
数据乙:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func4",
"func1",
"func3"
]
},
"classes": [
{
"invalid": [
"class1",
"class2"
],
"valid": [
"class8",
"class5"
],
"name": "class1"
}
],
"name": "abc"
}
]
}
我正在尝试合并每个文件(A 文件与 A 和 B 文件与 B)。 上一个问题帮助我弄清楚该怎么做,但我又被卡住了。 正如我在上一个问题中所说,合并文件有一个规则。 我再解释一下:考虑两个字典A1
和A2
。 我想合并无效的A1
与A2
和有效的A1
与A2
。 合并应该很容易,但问题是无效和有效的数据相互依赖。 该依赖项的规则 - 如果数字x
在A1
有效而在A2
无效,则其在合并报告中有效。 唯一无效的方法是同时在A1
和A2
的无效列表中(或者在其中一个无效而另一个不存在)。 为了合并 A 文件,我编写了以下代码:
def merge_A_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.A_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {}
new_block['valid'] = current_file['valid']
new_block['invalid'] = current_file['invalid']
new_block['name'] = current_file['name']
self.A_report["files"].append(new_block)
else:
block_to_merge = self.A_report["files"][filename_index]
merged_block = {'valid': [], 'invalid': []}
merged_block['valid'] = list(set(block_to_merge['valid'] + current_file['valid']))
merged_block['invalid'] = list({i for l in [block_to_merge['invalid'], current_file['invalid']]
for i in l if i not in merged_block['valid']})
merged_block['name'] = current_file['name']
self.A_report["files"][filename_index] = merged_block
为了合并B
文件,我写道:
def _merge_functional_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.B_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {'methods': {}, 'classes': []}
new_block['methods']['valid'] = current_file['methods']['valid']
new_block['methods']['invalid'] = current_file['methods']['invalid']
new_block['classes'] += [{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']]
new_block['name'] = current_file['name']
self.B_report["files"].append(new_block)
else:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
merged_block['methods']['valid'] = list(set(block_to_merge['methods']['valid'] + current_file['methods']['valid']))
merged_block['methods']['invalid'] = list({i for l in [block_to_merge['methods']['invalid'], current_file['methods']['invalid']]
for i in l if i not in merged_block['methods']['valid']})
merged_block['name'] = current_file['name']
self.B_report["files"][filename_index] = merged_block
看起来A
的代码有效并且按预期工作。 但是我对B
有问题,尤其是合并classes
。 我有问题的例子:
第一个文件:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
第二个文件:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
我得到:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func3",
"func1",
"func4"
]
},
"classes": [
{
"invalid": [
"class5",
"class3"
],
"valid": [
"class2",
"class1"
],
"name": "class1"
}
],
"name": "some_file1"
}
]
}
但这是错误的,因为例如class5
应该是有效的。 所以我的问题是:
编辑:我的第一个解释太复杂了。 我将尝试解释我想要实现的目标。 对于那些阅读该主题的人(欣赏它!),请忘记数据类型 A(为简单起见)。 考虑数据类型文件 B(在开始时显示)。 我正在尝试合并一堆 B 文件。 据我了解,该算法是这样做的:
合并方法:方法只有在两个块中都无效时才无效。 否则,它是有效的。
合并类:它变得越来越复杂,因为它是一个数组。 我想遵循我对方法所做的相同规则,但我首先需要找到数组中每个块的索引。
主要问题是合并类。 您能否就如何合并 B 类型文件提出一个不复杂的建议?
如果您可以为您展示的示例提供预期的输出,那就太好了。 根据我的理解,您要实现的是:
"files"
条目,它是具有以下结构的字典列表:{
"name": "file_name",
"methods": {
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
},
"classes": [
{
"name": "class_name",
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
}
]
}
您希望合并来自多个文件的结构,以便根据以下规则将具有相同"name"
文件条目合并在一起:
"methods"
每个名称:如果它在至少一个文件条目的"valid"
数组中,则进入"valid"
; 否则,如果进入"invalid"
。"name"
类也合并在一起, "valid"
和"invalid"
数组内的名称按照上述规则合并。以下对您的代码的分析假设我的理解如上所述。 让我们看一下合并lasses的代码片段:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
该代码在逻辑上不正确,因为:
block_to_merge["classes"]
每个类字典,这是前一个合并的 block 。merged_block
) 被初始化为一个空字典。class_index
为None
的情况下, merged_block
的类字典设置为前一个合并块中的类字典。 如果您考虑一下, class_index
将始终为None
,因为current_class
是从已经合并的block_to_merge["classes"]
枚举的。 因此,写入merged_block
的只是文件的第一个文件条目中的"classes"
条目。 在您的示例中,您可以验证"classes"
条目是否与第一个文件中的条目完全相同。
也就是说,您对如何合并文件的总体想法是正确的,但在实现方面可能会更简单(和高效)。 我将首先指出您代码中的非最佳实现,然后提供一个更简单的解决方案。
next
在列表中查找具有相同"name"
的现有条目,但这可能需要线性时间。 相反,您可以将它们存储在字典中,以"name"
作为键。修改后的代码如下:
class Merger:
def __init__(self):
# A structure optimized for efficiency:
# dict (file_name) -> {
# "methods": {
# "valid": set(names),
# "invalid": set(names),
# }
# "classes": dict (class_name) -> {
# "valid": set(names),
# "invalid": set(names),
# }
# }
self.file_dict = {}
def _create_entry(self, new_entry):
return {
"valid": set(new_entry["valid"]),
"invalid": set(new_entry["invalid"]),
}
def _merge_entry(self, merged_entry, new_entry):
merged_entry["valid"].update(new_entry["valid"])
merged_entry["invalid"].difference_update(new_entry["valid"])
for name in new_entry["invalid"]:
if name not in merged_entry["valid"]:
merged_entry["invalid"].add(name)
def merge_file(self, src_report):
# Method called to merge one file.
for current_file in src_report["files"]:
file_name = current_file["name"]
# Merge methods.
if file_name not in self.file_dict:
self.file_dict[file_name] = {
"methods": self._create_entry(current_file["methods"]),
"classes": {},
}
else:
self._merge_entry(self.file_dict[file_name]["methods"], current_file["methods"])
# Merge classes.
file_class_entry = self.file_dict[file_name]["classes"]
for class_entry in current_file["classes"]:
class_name = class_entry["name"]
if class_name not in file_class_entry:
file_class_entry[class_name] = self._create_entry(class_entry)
else:
self._merge_entry(file_class_entry[class_name], class_entry)
def post_process(self):
# Method called after all files are merged, and returns the data in its output form.
return [
{
"name": file_name,
"methods": {
"valid": list(file_entry["methods"]["valid"]),
"invalid": list(file_entry["methods"]["invalid"]),
},
"classes": [
{
"name": class_name,
"valid": list(class_entry["valid"]),
"invalid": list(class_entry["invalid"]),
}
for class_name, class_entry in file_entry["classes"].items()
],
}
for file_name, file_entry in self.file_dict.items()
]
我们可以通过以下方式测试实现:
def main():
a = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
b = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
import pprint
merge = Merger()
merge.merge_file(a)
merge.merge_file(b)
output = merge.post_process()
pprint.pprint(output)
if __name__ == '__main__':
main()
输出是:
[{'classes': [{'invalid': ['class3'],
'name': 'class1',
'valid': ['class2', 'class5', 'class8', 'class1']}],
'methods': {'invalid': ['func2', 'func8'],
'valid': ['func1', 'func4', 'func3']},
'name': 'some_file1'}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.