[英]Merging methods of two different sets of data in Python
這個問題被編輯了。 請先查看底部的編輯。
這個問題會有點長,所以我提前道歉。 請考慮兩種不同類型的數據:
數據一:
{
"files": [
{
"name": "abc",
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
}
]
}
數據乙:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func4",
"func1",
"func3"
]
},
"classes": [
{
"invalid": [
"class1",
"class2"
],
"valid": [
"class8",
"class5"
],
"name": "class1"
}
],
"name": "abc"
}
]
}
我正在嘗試合並每個文件(A 文件與 A 和 B 文件與 B)。 上一個問題幫助我弄清楚該怎么做,但我又被卡住了。 正如我在上一個問題中所說,合並文件有一個規則。 我再解釋一下:考慮兩個字典A1
和A2
。 我想合並無效的A1
與A2
和有效的A1
與A2
。 合並應該很容易,但問題是無效和有效的數據相互依賴。 該依賴項的規則 - 如果數字x
在A1
有效而在A2
無效,則其在合並報告中有效。 唯一無效的方法是同時在A1
和A2
的無效列表中(或者在其中一個無效而另一個不存在)。 為了合並 A 文件,我編寫了以下代碼:
def merge_A_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.A_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {}
new_block['valid'] = current_file['valid']
new_block['invalid'] = current_file['invalid']
new_block['name'] = current_file['name']
self.A_report["files"].append(new_block)
else:
block_to_merge = self.A_report["files"][filename_index]
merged_block = {'valid': [], 'invalid': []}
merged_block['valid'] = list(set(block_to_merge['valid'] + current_file['valid']))
merged_block['invalid'] = list({i for l in [block_to_merge['invalid'], current_file['invalid']]
for i in l if i not in merged_block['valid']})
merged_block['name'] = current_file['name']
self.A_report["files"][filename_index] = merged_block
為了合並B
文件,我寫道:
def _merge_functional_files(self, src_report):
for current_file in src_report["files"]:
filename_index = next((index for (index, d) in enumerate(self.B_report["files"]) if d["name"] == current_file["name"]), None)
if filename_index == None:
new_block = {'methods': {}, 'classes': []}
new_block['methods']['valid'] = current_file['methods']['valid']
new_block['methods']['invalid'] = current_file['methods']['invalid']
new_block['classes'] += [{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']]
new_block['name'] = current_file['name']
self.B_report["files"].append(new_block)
else:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
merged_block['methods']['valid'] = list(set(block_to_merge['methods']['valid'] + current_file['methods']['valid']))
merged_block['methods']['invalid'] = list({i for l in [block_to_merge['methods']['invalid'], current_file['methods']['invalid']]
for i in l if i not in merged_block['methods']['valid']})
merged_block['name'] = current_file['name']
self.B_report["files"][filename_index] = merged_block
看起來A
的代碼有效並且按預期工作。 但是我對B
有問題,尤其是合並classes
。 我有問題的例子:
第一個文件:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
第二個文件:
{
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
我得到:
{
"files": [
{
"methods": {
"invalid": [
"func2",
"func8"
],
"valid": [
"func3",
"func1",
"func4"
]
},
"classes": [
{
"invalid": [
"class5",
"class3"
],
"valid": [
"class2",
"class1"
],
"name": "class1"
}
],
"name": "some_file1"
}
]
}
但這是錯誤的,因為例如class5
應該是有效的。 所以我的問題是:
編輯:我的第一個解釋太復雜了。 我將嘗試解釋我想要實現的目標。 對於那些閱讀該主題的人(欣賞它!),請忘記數據類型 A(為簡單起見)。 考慮數據類型文件 B(在開始時顯示)。 我正在嘗試合並一堆 B 文件。 據我了解,該算法是這樣做的:
合並方法:方法只有在兩個塊中都無效時才無效。 否則,它是有效的。
合並類:它變得越來越復雜,因為它是一個數組。 我想遵循我對方法所做的相同規則,但我首先需要找到數組中每個塊的索引。
主要問題是合並類。 您能否就如何合並 B 類型文件提出一個不復雜的建議?
如果您可以為您展示的示例提供預期的輸出,那就太好了。 根據我的理解,您要實現的是:
"files"
條目,它是具有以下結構的字典列表:{
"name": "file_name",
"methods": {
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
},
"classes": [
{
"name": "class_name",
"invalid": ["list", "of", "names"],
"valid": ["list", "of", "names"]
}
]
}
您希望合並來自多個文件的結構,以便根據以下規則將具有相同"name"
文件條目合並在一起:
"methods"
每個名稱:如果它在至少一個文件條目的"valid"
數組中,則進入"valid"
; 否則,如果進入"invalid"
。"name"
類也合並在一起, "valid"
和"invalid"
數組內的名稱按照上述規則合並。以下對您的代碼的分析假設我的理解如上所述。 讓我們看一下合並lasses的代碼片段:
block_to_merge = self.B_report["files"][filename_index]
merged_block = {'methods': {}, 'classes': []}
for current_class in block_to_merge["classes"]:
current_classname = current_class.get("name")
class_index = next((index for (index, d) in enumerate(merged_block["classes"]) if d["name"] == current_classname), None)
if class_index == None:
merged_block['classes'] += ([{'valid': c['valid'], 'invalid': c['invalid'], 'name': c['name'] } for c in current_file['classes']])
else:
class_block_to_merge = merged_block["classes"][class_index]
class_merged_block = {'valid': [], 'invalid': []}
class_merged_block['valid'] = list(set(class_block_to_merge['valid'] + current_class['valid']))
class_merged_block['invalid'] = list({i for l in [class_block_to_merge['invalid'], current_class['invalid']]
for i in l if i not in class_merged_block['valid']})
class_merged_block['name'] = current_classname
merged_block["classes"][filename_index] = class_merged_block
該代碼在邏輯上不正確,因為:
block_to_merge["classes"]
每個類字典,這是前一個合並的 block 。merged_block
) 被初始化為一個空字典。class_index
為None
的情況下, merged_block
的類字典設置為前一個合並塊中的類字典。 如果您考慮一下, class_index
將始終為None
,因為current_class
是從已經合並的block_to_merge["classes"]
枚舉的。 因此,寫入merged_block
的只是文件的第一個文件條目中的"classes"
條目。 在您的示例中,您可以驗證"classes"
條目是否與第一個文件中的條目完全相同。
也就是說,您對如何合並文件的總體想法是正確的,但在實現方面可能會更簡單(和高效)。 我將首先指出您代碼中的非最佳實現,然后提供一個更簡單的解決方案。
next
在列表中查找具有相同"name"
的現有條目,但這可能需要線性時間。 相反,您可以將它們存儲在字典中,以"name"
作為鍵。修改后的代碼如下:
class Merger:
def __init__(self):
# A structure optimized for efficiency:
# dict (file_name) -> {
# "methods": {
# "valid": set(names),
# "invalid": set(names),
# }
# "classes": dict (class_name) -> {
# "valid": set(names),
# "invalid": set(names),
# }
# }
self.file_dict = {}
def _create_entry(self, new_entry):
return {
"valid": set(new_entry["valid"]),
"invalid": set(new_entry["invalid"]),
}
def _merge_entry(self, merged_entry, new_entry):
merged_entry["valid"].update(new_entry["valid"])
merged_entry["invalid"].difference_update(new_entry["valid"])
for name in new_entry["invalid"]:
if name not in merged_entry["valid"]:
merged_entry["invalid"].add(name)
def merge_file(self, src_report):
# Method called to merge one file.
for current_file in src_report["files"]:
file_name = current_file["name"]
# Merge methods.
if file_name not in self.file_dict:
self.file_dict[file_name] = {
"methods": self._create_entry(current_file["methods"]),
"classes": {},
}
else:
self._merge_entry(self.file_dict[file_name]["methods"], current_file["methods"])
# Merge classes.
file_class_entry = self.file_dict[file_name]["classes"]
for class_entry in current_file["classes"]:
class_name = class_entry["name"]
if class_name not in file_class_entry:
file_class_entry[class_name] = self._create_entry(class_entry)
else:
self._merge_entry(file_class_entry[class_name], class_entry)
def post_process(self):
# Method called after all files are merged, and returns the data in its output form.
return [
{
"name": file_name,
"methods": {
"valid": list(file_entry["methods"]["valid"]),
"invalid": list(file_entry["methods"]["invalid"]),
},
"classes": [
{
"name": class_name,
"valid": list(class_entry["valid"]),
"invalid": list(class_entry["invalid"]),
}
for class_name, class_entry in file_entry["classes"].items()
],
}
for file_name, file_entry in self.file_dict.items()
]
我們可以通過以下方式測試實現:
def main():
a = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1"
],
"invalid": [
"func3"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class1",
"class2"
],
"invalid": [
"class3",
"class5"
]
}
]
}
]
}
b = {
"files": [
{
"name": "some_file1",
"methods": {
"valid": [
"func4",
"func1",
"func3"
],
"invalid": [
"func2",
"func8"
]
},
"classes": [
{
"name": "class1",
"valid": [
"class8",
"class5"
],
"invalid": [
"class1",
"class2"
]
}
]
}
]
}
import pprint
merge = Merger()
merge.merge_file(a)
merge.merge_file(b)
output = merge.post_process()
pprint.pprint(output)
if __name__ == '__main__':
main()
輸出是:
[{'classes': [{'invalid': ['class3'],
'name': 'class1',
'valid': ['class2', 'class5', 'class8', 'class1']}],
'methods': {'invalid': ['func2', 'func8'],
'valid': ['func1', 'func4', 'func3']},
'name': 'some_file1'}]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.