如何使用正則表達式提取多語言內容

Question

我應該以這種方式從文本結構中提取多語言內容：

一些文本[it]意大利語文本[/ it] [en]英文文本[/ en] bla bla bla

其他文本[it]其他意大利語[/ it] [en]其他英語[/ en] bla bla bla

我將提取不包含beetwen多語言方括號的所有文本和包含beetwen當前語言方括號的文本。

例如，如果當前語言是“ en”，我將提取以下文本：

一些文本英文文本bla bla bla

其他文字其他英文文字bla bla bla

如何使用正則表達式正確提取文本？

Answer 1

喜歡

 $result = preg_replace_callback("~\[ (\w+) \] (.*?) \[ /\\1 \]~sx", 
       function($m) { return $m[1] == "en" ? $m[2] : ""; },
       $text);

Answer 2

假設這些標簽正確平衡並且永遠不會嵌套（這看起來是一個合理的假設），則可以執行以下操作：

$result = preg_replace('%\[it\].*?\[/it\]\s*|\[/?en\]\s*%s', '', $subject);

這專門用於查找並刪除[it]括起來的文本（以及[en]標簽本身）。

說明：

\[it\]     # Match [it]
.*?        # and everything that follows until 
\[/it\]    # the nearest [/it]
\s*        # plus any trailing whitespace
|          # or
\[/?en\]   # Match [en] or [/en]
\s*        # plus any trailing whitespace

如果要刪除標簽之間的任何文本，但要刪除[en]標簽之間的文本，則會變得更加復雜（仍然假設沒有嵌套標簽）：

$result = preg_replace('%\[(?!/?en\b)([^\]]+)\].*?\[/\1\]\s*|\[/?en\]\s*%s', '', $subject);

說明：

\[         # Match [
(?!/?en\b) # Assert that this is not an [en] tag
([^\]]+)   # Match and capture the tag name (anything until the next ])
\]         # Match ]
.*?        # and everything that follows until 
\[/\1\]    # the nearest corresponding closing tag
\s*        # plus any trailing whitespace
|          # or
\[/?en\]   # Match [en] or [/en]
\s*        # plus any trailing whitespace

Answer 3

我認為最好不要為此使用preg_replace

$languages = array(
      'en'=>array(
          'label' => 'english label'
      ),
      'it'=>array(
          'label' => 'italian label'
      )
    );

    $language = "it";
    $someTextForItalian = "bla bla bla bla %s bla bla bla.";    
    $someTextForItalian = 
         sprintf(
           $someTextForItalian,
           $languages[$language]['label']
         );

如何使用正則表達式提取多語言內容

問題描述

3 個解決方案

解決方案1
2 已采納 2011-07-26 13:51:46

解決方案2
1 2011-07-26 13:50:34

解決方案3
0 2011-07-26 14:35:32

如何使用正則表達式提取多語言內容

問題描述

3 個解決方案

解決方案1 2 已采納 2011-07-26 13:51:46

解決方案2 1 2011-07-26 13:50:34

解決方案3 0 2011-07-26 14:35:32

解決方案1
2 已采納 2011-07-26 13:51:46

解決方案2
1 2011-07-26 13:50:34

解決方案3
0 2011-07-26 14:35:32