Python-在字典中查找重復的值對/組

Question

我有以下腳本，該腳本遍歷css規則的文本文件，並將每個規則及其屬性存儲在字典中（對代碼的改進，我才剛剛開始使用Python）：

findGroups.py

import sys
source = sys.argv[1]
temp = open('pythonTestFile.txt', 'w+')
di = {}
with open(source, 'r') as infile:
    for line in infile:
        # if line starts with . or #, contains _ or - between 
        # words and has a space and opening brace(ie is css rule name)
        if re.search('((([\.\-\'])?\w+\s?\{', line):
           key = line.replace("{", "")
           di[key] = []
           line = next(file)
           while "}" not in line:
               # remove trailing whitespace and \n 
               line = ' '.join(line.split())
               di[key].append(line)
               line = next(infile)
temp.close();

source.txt

* {
    min-height: 1000px;
    overflow: hidden;
}

.leftContainerDiv {
    font-family: Helvetica;
    font-size: 10px;
    background: white;
}

#cs_ht_panel{   
    font-family:10px;
    display:block;
    font-family:Helvetica;
    width:auto;
}
//...etc

最好是，我希望輸出看起來像這樣（可讀性建議也很受歡迎）：

pythonTestFile.txt

Group 1, count(2) - font-family: Helvetica; + font-size: 10px;
Group 2: //...etc

我現在想做的是弄清哪些css屬性是重復出現的組，例如，如果font-size：10px和font-family：Helvetica在一條規則中同時出現，那么該組是否在其他任何規則中出現以及如何發生它會發生很多次。

我不完全確定該在哪里使用，我甚至無法弄清楚如何啟動某種比較算法，或者字典是否是存儲文本的正確數據結構。

編輯：為了回應評論，我無法使用第三方庫。 該腳本將在Red Hat VM上使用，並且只能將預先批准的軟件推送到這些腳本上，我無法僅下載庫或軟件包

Answer 1

為每個css屬性分配一個不同的質數，例如：

{
    'diplay: block': 2
    'font-size: 10px': 3,
    'font-family: Helvetica': 5,
    'min-height: 1000px': 7,
    'overflow: hidden': 11,
    'width: auto': 13,
    'background: white': 17,
}

然后做出一個命令，其中鍵是css選擇器（您稱為“規則”），值是它具有的所有屬性的乘積：

{
    '#cs_ht_panel': 390, # 2 * 3 * 5 * 13
    '*': 77, # 7 * 11
    '.leftContainerDiv': 255, # 3 * 5 * 17
}

現在，您可以輕松確定兩件事：

哪些選擇器（“規則”）具有屬性x （以質數表示）或一組屬性{x,y,z,..} （以質數乘積表示），通過查看選擇器號是否可除該號碼。
例如，哪些選擇器同時具有'font-family: Helvetica' （5）和font-size: 10px （3）？ 所有且僅可被15整除的那些。
兩個選擇器通過計算GCD（最大公約數）具有所有共同的屬性。
例如GCD（390，77）= 1->它們沒有共同的屬性
GCD（390，255）= 15->因式分解-> 3 * 5

您還可以通過遍歷所有選擇器的值來找到最常見的組，找到不是質數的所有合適的除數，並保留一個字典，以節省找到的除數個數。 每個除數都是一個組，您可以通過分解來找到其元素。

390-> 6 10 15 26 30 39 65 78 130 195
255-> 15 51 85
77->

因此，您有兩次15，其他所有1次。 這表示存在15個組2，這是屬性3和5的組。

最后一個計算步驟為2 ^ n，其中n是該CSS選擇器中的屬性數。 這應該不成問題，因為大多數選擇器的屬性少於10個，但屬性超過20個，您就開始遇到麻煩了。 我建議通過刪除前綴（moz-，webkit-）和后綴（-left，-right，-top，-bottom）來壓縮屬性

您可以（而且對於具有數百行的真實CSS文件，應該應該這樣做）僅使用集合及其操作（交集等）來代替數字，乘積和素數即可完成所有這些操作； 但這不是很酷嗎？ ;）

Answer 2

一個基於上述思想的解決方案-而不是使用質數-我使用的是集合和有序列表。 可能這就是您想要的嗎？

import re
import itertools

f = open('css_test.txt', 'r')
lines = f.readlines()
lines_str = ' '.join([l.strip() for l in lines])
#print lines_str

r = re.compile(r'.*?{(.*?)}') # Get All strings between {}
groups = r.findall(lines_str)
#print groups

# remove any stray spaces in the string and create groups of attributes like
# style: value
grps = []
for grp in groups:
    grps.append(filter(lambda x: len(x) > 0, grp.strip().split(';')))


# clean up those style: value attributes so that we get 'style:value'
# without any spaces and also collect all such attributes (we'd later create
# a set of these attributes)
grps2 = []
all_keys = []
for grp in grps:
    grp2 = []
    for g in grp:
        x = ':'.join([x.strip() for x in g.split(':')])
        grp2.append(x)
        all_keys.append(x)
    grps2.append(grp2)
set_keys = set(sorted(all_keys))

print set_keys
print '***********'
set_dict = {}
# For each combination of 2 of keys in the set find intersection of this
# set with the set of keys in the cleaned up groups above
# if intersection is the set of 2 keys: initialize a dictionary or add 1
for x in itertools.combinations(set_keys, 2):
    for g in grps2:
        set_x = set(x)
        set_g = set(g)
        #print "set_g : ", set_g
        if set_x  & set_g == set_x:
            print set_x
            if set_dict.has_key(x):
                set_dict[x] += 1
            else:
                set_dict[x] = 1

# print everything
print set_dict

即使此解決方案與您想要的不是完全匹配的，也許您可以按照上述思路來做您想做的事情嗎？

Python-在字典中查找重復的值對/組

問題描述

2 個解決方案

解決方案1
1 已采納 2015-05-21 13:39:34

解決方案2
1 2015-05-21 16:33:48

Python-在字典中查找重復的值對/組

問題描述

2 個解決方案

解決方案1 1 已采納 2015-05-21 13:39:34

解決方案2 1 2015-05-21 16:33:48

解決方案1
1 已采納 2015-05-21 13:39:34

解決方案2
1 2015-05-21 16:33:48