簡體   English   中英

使用 Python 從 JavaScript 數組中刪除重復項

[英]Remove duplicates from JavaScript array using Python

假設我有一個 JavaScript 元素數組,看起來非常類似於:

var oui = new Array({
    "pfx": "000000",
    "mask": 24,
    "desc": "00:00:00   Officially Xerox, but 0:0:0:0:0:0 is more common"
},{
    "pfx": "000001",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000002",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000003",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000004",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000004",
    "mask": 24,
    "desc": "Let's pretend this is a repeat"
   });

現在想象一下,文件非常大,一些“pfx”值在整個數據集中重復出現。 顯然,手動重復數據刪除是不可能的,所以我試圖找出以編程方式處理它的最佳方法。 如何編寫 python 腳本來讀取包含此數據集的 .JS 文件以進行重復數據刪除並刪除任何重復項? 換句話說,我想讀入 JS 文件,解析數組,然后生成另一個 JavaScript 文件,該文件具有類似的數組,但 pfx 變量只有唯一值。

我已經經歷了其他幾個本質上相似的 Stack Overflow 問題,但似乎沒有什么適合這種情況。 In my python testing, I can rarely just get the pfx variables by themselves to remove the duplicates, or Python struggles to read it in as a proper JSON object (even without the "var" and "new Array" portion). I should also note, that the reason that I'm doing the de-duping in Python over another JavaScript function within the JS file (which I tried following examples like this ) is that it just inflates the size of the JavaScript that has to be加載到頁面上。

將來,該陣列可能會繼續增長 - 因此,為了避免不必要地加載 JavaScript 以保持頁面響應時間快速,我認為這是一個可以而且應該離線執行並添加到頁面的步驟。

為了澄清起見,這里是我試圖模擬的網站的 model: https://www.wireshark.org/tools/oui-lookup.ZFC35FDC70D5FC69D269883A82EZA7 它本質上非常簡單。

研究:

將 Javascript 數組轉換為 python 列表?

從 JS 數組中刪除重復值

由於結構沒有嵌套,所以可以用正則表達式匹配數組,然后用JSON解析,用Python中的filter去除重復對象,然后替換為去重后的Z0ECD11C1D7A3BB87401FZD14A字符串。

使用數組字面量語法( [] )而不是new Array來保持整潔(最好永遠不要使用new Array ):

import re
import json
str = '''
var oui = [{
    "pfx": "000000",
    "mask": 24,
    "desc": "00:00:00   Officially Xerox, but 0:0:0:0:0:0 is more common"
},{
    "pfx": "000001",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000002",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000003",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000004",
    "mask": 24,
    "desc": "Xerox  Xerox Corporation"
},{
    "pfx": "000004",
    "mask": 24,
    "desc": "Let's pretend this is a repeat"
   }];
'''

def dedupe(match):
   jsonStr = match.group()
   list = json.loads(jsonStr)
   seenPfxs = set()
   def notDupe(obj):
        thisPfx = obj['pfx']
        if thisPfx in seenPfxs:
            return False
        seenPfxs.add(thisPfx)
        return True
   return json.dumps([obj for obj in list if notDupe(obj)])

dedupedStr = re.sub(r'(?s)\[[^\]]+\](?=;)', dedupe, str)
print(dedupedStr)

Output:

var oui = [{"pfx": "000000", "mask": 24, "desc": "00:00:00   Officially Xerox, but 0:0:0:0:0:0 is more common"}, {"pfx": "000001", "mask": 24, "desc": "Xerox  Xerox Corporation"}, {"pfx": "000002", "mask": 24, "desc": "Xerox  Xerox Corporation"}, {"pfx": "000003", "mask": 24, "desc": "Xerox  Xerox Corporation"}, {"pfx": "000004", "mask": 24, "desc": "Xerox  Xerox Corporation"}];

如果可能,您可能會考慮將數據存儲在單獨的標簽中,而不是內聯 Javascript - 這將更易於維護。 例如,在您的 HTML 中,而不是

var oui = [{
    "pfx": "000000",
    "mask": 24,
    "desc": "00:00:00   Officially Xerox, but 0:0:0:0:0:0 is more common"
},{

考慮類似的東西

 var oui = JSON.parse(document.querySelector('[data-oui').textContent); console.log(oui);
 <script data-oui type="application/json">[{ "pfx": "000000", "mask": 24, "desc": "00:00:00 Officially Xerox, but 0:0:0:0:0:0 is more common" }]</script>

那么您不必動態更改 Javascript,而只需<script data-oui type="application/json">標記。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM