簡體   English   中英

Python 在正則表達式中使用變量

[英]Python use variables inside regex

對於我自己的項目,我有一個包含 20 萬個英文單詞的 .txt 文件。 我有一個名為WordCross (游戲)的 class,它將搜索具有某些字母作為參數的單詞,假設我有字母 AXDEL P。我想返回一個包含這些字母的英文單詞列表。 現在我偶然發現了一個問題。 我想使用正則表達式並將匹配的單詞添加到“命中”列表中。 但是,我想不出一種方法來創建這個正則表達式。

這是我當前的代碼:

import re
class WordCross:
    def __init__(self, a,b,c,d,e,f):
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if words.lower() == re.search("a", words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

任何幫助將不勝感激, 親切的問候, Douwe

如果您只想返回與傳遞給構造函數的所有字母匹配的單詞,則需要使用re.match並向正則表達式添加一個行尾錨點。 您可以使用星號運算符 ( * ) 允許將任意數量的字母傳遞給構造函數(請參閱手冊)。 在這個演示中,我模擬了使用字符串中的單詞列表讀取文件:

wordlist = '''
Founded in two thousand and eight Stack Overflow is the largest most trusted 
online community for anyone that codes to learn share their knowledge and 
build their careers More than fifty million unique visitors come to Stack Overflow
each month to help solve coding problems develop new skills and find job opportunities
'''.split()
wordlist = list(set(wordlist))

import re
class WordCross:
    def __init__(self, *letters):
        # file = open("english3.txt", "r")
        hits = []
        charset = f"[{''.join(letters)}]"
        regex = re.compile(rf"(?!.*({charset}).*\1){charset}+$", re.I)
        for word in wordlist:
            if regex.match(word) is not None:
                hits.append(word)
        hits.sort()
        print(hits)

test = WordCross("A", "C", "E", "H", "K", "T", "S")

Output:

['Stack', 'each', 'the']

我不確定您要使用什么正則表達式,但是使用簡單的字符串替換來構建表達式是微不足道的。 您也可以更改 function 以接受任意數量的模式進行搜索。 希望這有所幫助。

import re
class WordCross:
    def __init__(self, *patterns):
        list_of_patterns = "|".join(patterns)
        reg_exp = r"({0})".format(list_of_patterns)
        print(reg_exp)
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if re.search(reg_exp, words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

我假設您文件中的單詞是行分隔的。

代碼:

import re
from io import StringIO

source = '''
RegExr was created by gskinner.com, and is proudly hosted by Media Temple.
Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community, and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
'''.split()  # assuming words are line-separated here.

file_simulation = StringIO('\n'.join(source))  # simulating file open


class WordCross:
    def __init__(self, *args):
        self.file = file_simulation
        self.hits = []

        for words in self.file:
            if re.search(f"[{''.join(args)}]", words.upper()):
                self.hits.append(words.strip())

        self.hits.sort()
        print(self.hits)


test = WordCross("A", "B", "C", "D", "E", "F")

結果:

['Cheatsheet,', 'Community,', ... 'view', 'was']

進程以退出代碼 0 結束

幾個建議:

  • 我在這里看不到任何值得 class 的東西。 一個簡單的 function 就足夠了。

  • 不要使用文件作為變量; 它是 python 內置的名稱。

  • 通常,當使用打開的文件句柄時,最好在 with 塊中這樣做。

未經測試:

import re
def WordCross(*patterns):
    pattern = "|".join(patterns)
    c_pattern = re.compile(pattern, re.IGNORECASE)
    with open("english3.txt") as fp:
        hits = [line for line in fp if c_pattern.search(line)]
    print(sorted(hits))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM