[英]How to remove duplicates from an array using forEach method in javascript
[英]Remove duplicates from JavaScript array using Python
假設我有一個 JavaScript 元素數組,看起來非常類似於:
var oui = new Array({
"pfx": "000000",
"mask": 24,
"desc": "00:00:00 Officially Xerox, but 0:0:0:0:0:0 is more common"
},{
"pfx": "000001",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000002",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000003",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000004",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000004",
"mask": 24,
"desc": "Let's pretend this is a repeat"
});
現在想象一下,文件非常大,一些“pfx”值在整個數據集中重復出現。 顯然,手動重復數據刪除是不可能的,所以我試圖找出以編程方式處理它的最佳方法。 如何編寫 python 腳本來讀取包含此數據集的 .JS 文件以進行重復數據刪除並刪除任何重復項? 換句話說,我想讀入 JS 文件,解析數組,然后生成另一個 JavaScript 文件,該文件具有類似的數組,但 pfx 變量只有唯一值。
我已經經歷了其他幾個本質上相似的 Stack Overflow 問題,但似乎沒有什么適合這種情況。 In my python testing, I can rarely just get the pfx variables by themselves to remove the duplicates, or Python struggles to read it in as a proper JSON object (even without the "var" and "new Array" portion). I should also note, that the reason that I'm doing the de-duping in Python over another JavaScript function within the JS file (which I tried following examples like this ) is that it just inflates the size of the JavaScript that has to be加載到頁面上。
將來,該陣列可能會繼續增長 - 因此,為了避免不必要地加載 JavaScript 以保持頁面響應時間快速,我認為這是一個可以而且應該離線執行並添加到頁面的步驟。
為了澄清起見,這里是我試圖模擬的網站的 model: https://www.wireshark.org/tools/oui-lookup.ZFC35FDC70D5FC69D269883A82EZA7 。 它本質上非常簡單。
研究:
由於結構沒有嵌套,所以可以用正則表達式匹配數組,然后用JSON解析,用Python中的filter
去除重復對象,然后替換為去重后的Z0ECD11C1D7A3BB87401FZD14A字符串。
使用數組字面量語法( [
和]
)而不是new Array
來保持整潔(最好永遠不要使用new Array
):
import re
import json
str = '''
var oui = [{
"pfx": "000000",
"mask": 24,
"desc": "00:00:00 Officially Xerox, but 0:0:0:0:0:0 is more common"
},{
"pfx": "000001",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000002",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000003",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000004",
"mask": 24,
"desc": "Xerox Xerox Corporation"
},{
"pfx": "000004",
"mask": 24,
"desc": "Let's pretend this is a repeat"
}];
'''
def dedupe(match):
jsonStr = match.group()
list = json.loads(jsonStr)
seenPfxs = set()
def notDupe(obj):
thisPfx = obj['pfx']
if thisPfx in seenPfxs:
return False
seenPfxs.add(thisPfx)
return True
return json.dumps([obj for obj in list if notDupe(obj)])
dedupedStr = re.sub(r'(?s)\[[^\]]+\](?=;)', dedupe, str)
print(dedupedStr)
Output:
var oui = [{"pfx": "000000", "mask": 24, "desc": "00:00:00 Officially Xerox, but 0:0:0:0:0:0 is more common"}, {"pfx": "000001", "mask": 24, "desc": "Xerox Xerox Corporation"}, {"pfx": "000002", "mask": 24, "desc": "Xerox Xerox Corporation"}, {"pfx": "000003", "mask": 24, "desc": "Xerox Xerox Corporation"}, {"pfx": "000004", "mask": 24, "desc": "Xerox Xerox Corporation"}];
如果可能,您可能會考慮將數據存儲在單獨的標簽中,而不是內聯 Javascript - 這將更易於維護。 例如,在您的 HTML 中,而不是
var oui = [{
"pfx": "000000",
"mask": 24,
"desc": "00:00:00 Officially Xerox, but 0:0:0:0:0:0 is more common"
},{
考慮類似的東西
var oui = JSON.parse(document.querySelector('[data-oui').textContent); console.log(oui);
<script data-oui type="application/json">[{ "pfx": "000000", "mask": 24, "desc": "00:00:00 Officially Xerox, but 0:0:0:0:0:0 is more common" }]</script>
那么您不必動態更改 Javascript,而只需<script data-oui type="application/json">
標記。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.