[英]Merging methods of two different sets of data in Python
This question was edited.这个问题被编辑了。 Please see the edit on the bottom first.
请先查看底部的编辑。
This question is going to be a bit long so I'm sorry in advance.这个问题会有点长,所以我提前道歉。 Please consider two different types of data:
请考虑两种不同类型的数据:
Data A:数据一:
{
"files": [
{
"name": "abc",
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
}
]
}
Data B:数据乙:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func4",
"func1",
"func3"
]
},
"classes": [
{
"invalid": [
"class1",
"class2"
],
"valid": [
"class8",
"class5"
],
"name": "class1"
}
],
"name": "abc"
}
]
}
I'm trying to merge each file (A files with A and B files with B).我正在尝试合并每个文件(A 文件与 A 和 B 文件与 B)。 Previous question helped me figure out how to do it but I got stuck again.
上一个问题帮助我弄清楚该怎么做,但我又被卡住了。 As I said in the previous question there is a rule for merging the files.
正如我在上一个问题中所说,合并文件有一个规则。 I'll explain again: Consider two dictionaries
A1
and A2
.我再解释一下:考虑两个字典
A1
和A2
。 I want to merge invalid of A1
with A2
and valid of A1
with A2
.我想合并无效的
A1
与A2
和有效的A1
与A2
。 The merge should be easy enough but the problem is that the data of invalid and valid dependents on each other.合并应该很容易,但问题是无效和有效的数据相互依赖。 The rule of that dependency - if number
x
is valid in A1
and invalid in A2
then its valid in the merged report.该依赖项的规则 - 如果数字
x
在A1
有效而在A2
无效,则其在合并报告中有效。 The only way to be invalid is to be in the invalid list of both of A1
and A2
(Or invalid in one of them while not existing in the other).唯一无效的方法是同时在
A1
和A2
的无效列表中(或者在其中一个无效而另一个不存在)。 In order to merge the A files I wrote the following code:为了合并 A 文件,我编写了以下代码:
def merge_A_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.A_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {}
new_block['valid'] = current_file['valid']
new_block['invalid'] = current_file['invalid']
new_block['name'] = current_file['name']
self.A_report["files"].append(new_block)
else:
block_to_merge = self.A_report["files"][filename_index]
merged_block = {'valid': [], 'invalid': []}
merged_block['valid'] = list(set(block_to_merge['valid'] + current_file['valid']))
merged_block['invalid'] = list({i for l in [block_to_merge['invalid'], current_file['invalid']]
for i in l if i not in merged_block['valid']})
merged_block['name'] = current_file['name']
self.A_report["files"][filename_index] = merged_block
For merging B
files I wrote:为了合并
B
文件,我写道:
def _merge_functional_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.B_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {'methods': {}, 'classes': []}
new_block['methods']['valid'] = current_file['methods']['valid']
new_block['methods']['invalid'] = current_file['methods']['invalid']
new_block['classes'] += [{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']]
new_block['name'] = current_file['name']
self.B_report["files"].append(new_block)
else:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
merged_block['methods']['valid'] = list(set(block_to_merge['methods']['valid'] + current_file['methods']['valid']))
merged_block['methods']['invalid'] = list({i for l in [block_to_merge['methods']['invalid'], current_file['methods']['invalid']]
for i in l if i not in merged_block['methods']['valid']})
merged_block['name'] = current_file['name']
self.B_report["files"][filename_index] = merged_block
It looks like the code of A
is valid and works as expected.看起来
A
的代码有效并且按预期工作。 But I have a problem with B
, especially with merging classes
.但是我对
B
有问题,尤其是合并classes
。 The example I have problem with:我有问题的例子:
First file:第一个文件:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
Second file:第二个文件:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
I get:我得到:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func3",
"func1",
"func4"
]
},
"classes": [
{
"invalid": [
"class5",
"class3"
],
"valid": [
"class2",
"class1"
],
"name": "class1"
}
],
"name": "some_file1"
}
]
}
But it's wrong because for example class5
should be valid.但这是错误的,因为例如
class5
应该是有效的。 So my questions are:所以我的问题是:
Edit : My first explanation was too complicated.编辑:我的第一个解释太复杂了。 I'll try to explain what I'm trying to achieve.
我将尝试解释我想要实现的目标。 For those of you who read the topic (appreciate it!), please forget about data type A (for simplicity).
对于那些阅读该主题的人(欣赏它!),请忘记数据类型 A(为简单起见)。 Consider Data type file B (that was showed at the start).
考虑数据类型文件 B(在开始时显示)。 I'm trying to merge a bunch of B files.
我正在尝试合并一堆 B 文件。 As I understand, the algorithm for that is to do:
据我了解,该算法是这样做的:
To merge methods: method is invalid only if its invalid in both of the block.合并方法:方法只有在两个块中都无效时才无效。 Otherwise, it's valid.
否则,它是有效的。
To merge classes: It's getting more complicated because it's an array.合并类:它变得越来越复杂,因为它是一个数组。 I want to follow same rule that I did for methods but I need to find the index of each block in the array, first.
我想遵循我对方法所做的相同规则,但我首先需要找到数组中每个块的索引。
The main problem is with merging classes.主要问题是合并类。 Can you please suggest a non-complicated on how to merge B type files?
您能否就如何合并 B 类型文件提出一个不复杂的建议?
It would be great if you could provide an expected output for the example you're showing.如果您可以为您展示的示例提供预期的输出,那就太好了。 Based on my understanding, what you're trying to achieves is:
根据我的理解,您要实现的是:
"files"
entry, which is a list of dictionaries with the structure:"files"
条目,它是具有以下结构的字典列表:{
"name": "file_name",
"methods": {
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
},
"classes": [
{
"name": "class_name",
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
}
]
}
You wish to merge structures from multiple files, so that file entries with the same "name"
are merged together, according to the following rule:您希望合并来自多个文件的结构,以便根据以下规则将具有相同
"name"
文件条目合并在一起:
"methods"
: if goes into "valid"
if it is in the "valid"
array in at least one file entry;"methods"
每个名称:如果它在至少一个文件条目的"valid"
数组中,则进入"valid"
; otherwise if goes into "invalid"
."invalid"
。"name"
are also merged together, and names inside the "valid"
and "invalid"
arrays are merged according to the above rule."name"
类也合并在一起, "valid"
和"invalid"
数组内的名称按照上述规则合并。 The following analysis of your code assumes my understanding as stated above.以下对您的代码的分析假设我的理解如上所述。 Let's look at the code snippet for merging lasses:
让我们看一下合并lasses的代码片段:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
The code is logically incorrect because:该代码在逻辑上不正确,因为:
block_to_merge["classes"]
, which is the previous merged block .block_to_merge["classes"]
每个类字典,这是前一个合并的 block 。merged_block
) is initialized to an empty dictionary.merged_block
) 被初始化为一个空字典。class_index
is None
, the class dictionary in merged_block
is set to the the class dictionary in the previous merged block.class_index
为None
的情况下, merged_block
的类字典设置为前一个合并块中的类字典。 If you think about it, class_index
will always be None
, because current_class
is enumerated from block_to_merge["classes"]
, which is already merged.如果您考虑一下,
class_index
将始终为None
,因为current_class
是从已经合并的block_to_merge["classes"]
枚举的。 Thus, what gets written into the merged_block
is only the "classes"
entries from the first file entry for a file.因此,写入
merged_block
的只是文件的第一个文件条目中的"classes"
条目。 In your example, you can verify that the "classes"
entry is exactly the same as that in the first file.在您的示例中,您可以验证
"classes"
条目是否与第一个文件中的条目完全相同。
That said, your overall idea of how to merge the files is correct, but implementation-wise it could be a lot more simpler (and efficient).也就是说,您对如何合并文件的总体想法是正确的,但在实现方面可能会更简单(和高效)。 I'll first point out the non-optimal implementations in your code, and then provide a simpler solution.
我将首先指出您代码中的非最佳实现,然后提供一个更简单的解决方案。
next
to find an existing entry in the list with the same "name"
, but this could take linear time.next
在列表中查找具有相同"name"
的现有条目,但这可能需要线性时间。 Instead, you can store these in a dictionary, with "name"
as keys."name"
作为键。 A revised version of the code is as follows:修改后的代码如下:
class Merger:
def __init__(self):
# A structure optimized for efficiency:
# dict (file_name) -> {
# "methods": {
# "valid": set(names),
# "invalid": set(names),
# }
# "classes": dict (class_name) -> {
# "valid": set(names),
# "invalid": set(names),
# }
# }
self.file_dict = {}
def _create_entry(self, new_entry):
return {
"valid": set(new_entry["valid"]),
"invalid": set(new_entry["invalid"]),
}
def _merge_entry(self, merged_entry, new_entry):
merged_entry["valid"].update(new_entry["valid"])
merged_entry["invalid"].difference_update(new_entry["valid"])
for name in new_entry["invalid"]:
if name not in merged_entry["valid"]:
merged_entry["invalid"].add(name)
def merge_file(self, src_report):
# Method called to merge one file.
for current_file in src_report["files"]:
file_name = current_file["name"]
# Merge methods.
if file_name not in self.file_dict:
self.file_dict[file_name] = {
"methods": self._create_entry(current_file["methods"]),
"classes": {},
}
else:
self._merge_entry(self.file_dict[file_name]["methods"], current_file["methods"])
# Merge classes.
file_class_entry = self.file_dict[file_name]["classes"]
for class_entry in current_file["classes"]:
class_name = class_entry["name"]
if class_name not in file_class_entry:
file_class_entry[class_name] = self._create_entry(class_entry)
else:
self._merge_entry(file_class_entry[class_name], class_entry)
def post_process(self):
# Method called after all files are merged, and returns the data in its output form.
return [
{
"name": file_name,
"methods": {
"valid": list(file_entry["methods"]["valid"]),
"invalid": list(file_entry["methods"]["invalid"]),
},
"classes": [
{
"name": class_name,
"valid": list(class_entry["valid"]),
"invalid": list(class_entry["invalid"]),
}
for class_name, class_entry in file_entry["classes"].items()
],
}
for file_name, file_entry in self.file_dict.items()
]
We can test the implementation by:我们可以通过以下方式测试实现:
def main():
a = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
b = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
import pprint
merge = Merger()
merge.merge_file(a)
merge.merge_file(b)
output = merge.post_process()
pprint.pprint(output)
if __name__ == '__main__':
main()
The output is:输出是:
[{'classes': [{'invalid': ['class3'],
'name': 'class1',
'valid': ['class2', 'class5', 'class8', 'class1']}],
'methods': {'invalid': ['func2', 'func8'],
'valid': ['func1', 'func4', 'func3']},
'name': 'some_file1'}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.