簡體   English   中英

在Python中使用序列匹配器查找最長的公共字符串

[英]Using sequence matcher in Python to find the longest common string

我試圖在Python中使用difflib.SequenceMatcher返回最大的通用字符串

string1="""ERROR agave_util.py:64 Timed out waiting for HA alert generated CRITICAL ha_test_util.py:44 HA alert generated, Stack:File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 909, in <module>    main(FLAGS, sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 878, in main    worker.run(sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 326, in run    if not self.__test_phase_wrapper(test_method):  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 87, in test_stargate_master_power_off    self._host_power_off_test_cycle(host_of_stargate_master)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 469, in power_off_and_check_ha    self.wait_for_ha_alert(cutoff_usecs=latest_alert_start, **kwargs)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 418, in wait_for_ha_alert    interval=interval,  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message) ERROR nutanix_test_runner_worker.py:595 Test failed: 1exc_type: <type 'exceptions.SystemExit'>exc_value: 1stack:   File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 87, in test_stargate_master_power_off    self._host_power_off_test_cycle(host_of_stargate_master)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 469, in power_off_and_check_ha    self.wait_for_ha_alert(cutoff_usecs=latest_alert_start, **kwargs)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 418, in wait_for_ha_alert    interval=interval,  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message)  File "/main/.python/util/base/log.py", line 204, in CHECK    FATAL(log_msg, **kwargs)  File "/main/.python/util/base/log.py", line 185, in FATAL    sys.exit(1) ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.157. ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.157. ERROR nutanix_test.py:1699 Failed to save cluster configuration"""

string2="""ERROR agave_util.py:64 Timed out waiting for VMs [u'vm_353ca5', u'vm_e02d7f'] power on CRITICAL ha_test_util.py:44 VMs [u'vm_353ca5', u'vm_e02d7f'] power on, Stack:File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 909, in <module>    main(FLAGS, sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 878, in main    worker.run(sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 326, in run    if not self.__test_phase_wrapper(test_method):  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 67, in test_zoo_keeper_leader_power_off    self._host_power_off_test_cycle(leader_host)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 468, in power_off_and_check_ha    self.verify_vms_not_on_host(host_vms, host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 617, in verify_vms_not_on_host    self.wait_for_vms_power_on(vm_names, per_vm_timeout)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 599, in wait_for_vms_power_on    interval=15)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message) ERROR nutanix_test_runner_worker.py:595 Test failed: 1exc_type: <type 'exceptions.SystemExit'>exc_value: 1stack:   File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 67, in test_zoo_keeper_leader_power_off    self._host_power_off_test_cycle(leader_host)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 468, in power_off_and_check_ha    self.verify_vms_not_on_host(host_vms, host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 617, in verify_vms_not_on_host    self.wait_for_vms_power_on(vm_names, per_vm_timeout)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 599, in wait_for_vms_power_on    interval=15)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message)  File "/main/.python/util/base/log.py", line 204, in CHECK    FATAL(log_msg, **kwargs)  File "/main/.python/util/base/log.py", line 185, in FATAL    sys.exit(1) ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.156. ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.156. ERROR nutanix_test.py:1699 Failed to save cluster configuration"""

match = SequenceMatcher(None, string1, string2).find_longest_match(0, len(string1), 0, len(string2))
print match
print(string1[match.a: match.a + match.size])

string1="""ERROR agave_util.py:64 Timed out waiting for HA alert generated CRITICAL ha_test_util.py:44,"""
string2="""ERROR agave_util.py:64 Timed out waiting for VMs [u'vm_353ca5', u'vm_e02d7f'] power on CRITICAL ha_test_util.py:44"""
match = SequenceMatcher(None, string1, string2).find_longest_match(0,    len(string1), 0, len(string2))
print(string1[match.a: match.a + match.size])

因此,基本上在比較string1string2 [前兩行]時,返回CRITICAL ha_test_util.py:44 ,而當我從string1string2切出幾行[第6行和第7行]時,它返回ERROR agave_util.py:64 Timed out waiting for

基本上我的問題是為什么序列匹配器在我的第一種情況下沒有返回正確的匹配?

您正在遇到SequenceMatcher自動垃圾試探法 (在您的情況下為負面)的影響。 文檔

自動垃圾啟發SequenceMatcher支持一種自動將某些序列項視為垃圾的啟發。 試探法計算每個單個項目出現在序列中的次數。 如果某項的重復項(在第一個項之后)占序列的1%以上,並且該序列的長度至少為200,則該項目被標記為“受歡迎”,並且出於序列匹配的目的被視為垃圾。 可以通過在創建SequenceMatcher時將autojunk參數設置為False來關閉此啟發式。

SequenceMatcher構造函數中, autojunk默認為True 如果嘗試使用autojunk=Falseautojunk=False獲得預期的最長匹配:

from difflib import SequenceMatcher

string1 = """ERROR agave_util.py:64 Timed out waiting for HA alert generated CRITICAL ha_test_util.py:44 HA alert generated, Stack:File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 909, in <module>    main(FLAGS, sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 878, in main    worker.run(sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 326, in run    if not self.__test_phase_wrapper(test_method):  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 87, in test_stargate_master_power_off    self._host_power_off_test_cycle(host_of_stargate_master)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 469, in power_off_and_check_ha    self.wait_for_ha_alert(cutoff_usecs=latest_alert_start, **kwargs)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 418, in wait_for_ha_alert    interval=interval,  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message) ERROR nutanix_test_runner_worker.py:595 Test failed: 1exc_type: <type 'exceptions.SystemExit'>exc_value: 1stack:   File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 87, in test_stargate_master_power_off    self._host_power_off_test_cycle(host_of_stargate_master)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 469, in power_off_and_check_ha    self.wait_for_ha_alert(cutoff_usecs=latest_alert_start, **kwargs)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 418, in wait_for_ha_alert    interval=interval,  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message)  File "/main/.python/util/base/log.py", line 204, in CHECK    FATAL(log_msg, **kwargs)  File "/main/.python/util/base/log.py", line 185, in FATAL    sys.exit(1) ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.157. ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.157. ERROR nutanix_test.py:1699 Failed to save cluster configuration"""
string2 = """ERROR agave_util.py:64 Timed out waiting for VMs [u'vm_353ca5', u'vm_e02d7f'] power on CRITICAL ha_test_util.py:44 VMs [u'vm_353ca5', u'vm_e02d7f'] power on, Stack:File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 909, in <module>    main(FLAGS, sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 878, in main    worker.run(sync_state)  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 326, in run    if not self.__test_phase_wrapper(test_method):  File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 67, in test_zoo_keeper_leader_power_off    self._host_power_off_test_cycle(leader_host)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 468, in power_off_and_check_ha    self.verify_vms_not_on_host(host_vms, host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 617, in verify_vms_not_on_host    self.wait_for_vms_power_on(vm_names, per_vm_timeout)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 599, in wait_for_vms_power_on    interval=15)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message) ERROR nutanix_test_runner_worker.py:595 Test failed: 1exc_type: <type 'exceptions.SystemExit'>exc_value: 1stack:   File "/main/qa/py/qa/agave/nutanix_test_runner_worker.py", line 502, in __test_phase_wrapper    func()  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 67, in test_zoo_keeper_leader_power_off    self._host_power_off_test_cycle(leader_host)  File "/main/qa/test/agave/acropolis_tests/ha/best_effort_power_off_test.py", line 27, in _host_power_off_test_cycle    self.ha_util.power_off_and_check_ha(host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 468, in power_off_and_check_ha    self.verify_vms_not_on_host(host_vms, host)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 617, in verify_vms_not_on_host    self.wait_for_vms_power_on(vm_names, per_vm_timeout)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 599, in wait_for_vms_power_on    interval=15)  File "/main/.python/qa/util/agave_tools/ha_test_util.py", line 44, in wait_for_true    CHECK(result, message)  File "/main/.python/util/base/log.py", line 204, in CHECK    FATAL(log_msg, **kwargs)  File "/main/.python/util/base/log.py", line 185, in FATAL    sys.exit(1) ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.156. ERROR nutanix_test.py:696 Failed to get gflags from 10.5.132.156. ERROR nutanix_test.py:1699 Failed to save cluster configuration"""

match = SequenceMatcher(None, string1, string2, autojunk=False).find_longest_match(0, len(string1), 0, len(string2))
print(match)

輸出:

Match(a=110, b=156, size=534)

可以肯定的是,我們可以檢查所有匹配的塊並找到最長的塊:

>>> max(SequenceMatcher(None, string1, string2, autojunk=False).get_matching_blocks(),
...     key=lambda m: m.size)
Match(a=110, b=156, size=534)

為了說明自動autojunk對一個簡單示例的影響,讓我們看一下發生的情況:

>>> a = "aa:bb:cc" + ":"*200
>>> b = "aa:bb" + ":"*200
>>> SequenceMatcher(None, a, b).find_longest_match(0, len(a), 0, len(b))
Match(a=0, b=0, size=6)     # : is classified as junk
>>> SequenceMatcher(None, a, b, autojunk=False).find_longest_match(0, len(a), 0, len(b))
Match(a=8, b=5, size=200)   # : is NOT classified as junk

在第一種情況下(默認autojunk=True )的:被認為是垃圾字符(它代表這是至少200個項目長序列的1%以上),以及作為結果,最長匹配“期待權人“只有6個字符(前6個字符)。

在第二種情況下(顯式autojunk=False ),垃圾啟發式功能已關閉,因此最長的匹配項是最后200個字符。

如果重復較短序列(短於200個字符)相同的測試,可以看到autojunk沒什么區別,因為垃圾啟發式被關閉(見 )。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM