簡體   English   中英

在 PHP 中,有沒有辦法檢測字符串是否包含任何單詞?

[英]in PHP, is there a way to detect if a string contains any words?

問題是關於檢測字符串是否有任何單詞(來自任何語言)。 我不是特別在尋找一個特定的詞,只是測試一個字符串中是否有現實世界中的現有詞。

示例$str = 'allo'將返回 true 並且$str = 'zyzassk ' 將返回 false

我試過preg_match_all('/\w/', $input_lines, $output_array); preg_math \w 返回每個單獨的字母,但是如何獲得完整的單詞? 是否有一個庫可以針對字典進行測試?

有沒有辦法或 php function 做到這一點?

有幾種方法可以做到:

使用 str_contains: https://stackoverflow.com/a/65473395/4717133

使用strpos: https://www.php.net/manual/es/function.strpos.php

主要問題是您需要語言中的模式匹配才能知道它是否存在於任何語言中; 這可能有點折磨人,不應該這樣做......

但你可以使用一些應用程序,如谷歌翻譯......

https://cloud.google.com/translate/docs/basic/detecting-language

方法 HTTP 和 URL:

POST https://translation.googleapis.com/language/translate/v2/detect

JSON 請求正文:

{
  "q": "Mi comida favorita es una enchilada."
}

要提交您的請求,您可以使用 curl:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://translation.googleapis.com/language/translate/v2/detect

你會得到這樣的回應:

{
  "data": {
    "detections": [
      [
        {
          "confidence": 1,
          "isReliable": false,
          "language": "es"
        }
      ]
    ]
  }
}

我做了一些挖掘,我想出了這個

function is_str_have_human_words ( $txtToDetect ) {
    
    $highestLangCode = ''; 
    
    if ( preg_match('/[\x{4E00}-\x{9FBF}]/u', $txtToDetect) )   { $highestLangCode = 'zh'; }
    if ( preg_match('/[\x{3040}-\x{309F}]/u', $txtToDetect) )   { $highestLangCode = 'zh'; }
    if ( preg_match('/[\x{30A0}-\x{30FF}]/u', $txtToDetect) )   { $highestLangCode = 'zh'; }
    if ( preg_match('/[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]/u', $txtToDetect) ) { $highestLangCode = 'ko'; }
    if ( preg_match('/\p{Thai}/u', $txtToDetect) )              { $highestLangCode = 'th'; }
    if ( preg_match('/\p{Arabic}/u', $txtToDetect) )            { $highestLangCode = 'ar'; }
    if ( preg_match('/\p{Armenian}/u', $txtToDetect) )          { $highestLangCode = 'hy'; }
    if ( preg_match('/\p{Bengali}/u', $txtToDetect) )           { $highestLangCode = 'bn'; }
    if ( preg_match('/\p{Devanagari}/u', $txtToDetect) )        { $highestLangCode = 'hi'; }
    if ( preg_match('/\p{Georgian}/u', $txtToDetect) )          { $highestLangCode = 'ka'; }
    if ( preg_match('/\p{Greek}/u', $txtToDetect) )             { $highestLangCode = 'el'; }
    if ( preg_match('/\p{Gujarati}/u', $txtToDetect) )          { $highestLangCode = 'gu'; }
    if ( preg_match('/\p{Hebrew}/u', $txtToDetect) )            { $highestLangCode = 'he'; }
    if ( preg_match('/\p{Kannada}/u', $txtToDetect) )           { $highestLangCode = 'kn'; }
    if ( preg_match('/\p{Khmer}/u', $txtToDetect) )             { $highestLangCode = 'km'; }
    if ( preg_match('/\p{Lao}/u', $txtToDetect) )               { $highestLangCode = 'lo'; }
    if ( preg_match('/\p{Limbu}/u', $txtToDetect) )             { $highestLangCode = 'li'; }
    if ( preg_match('/\p{Malayalam}/u', $txtToDetect) )         { $highestLangCode = 'ml'; }
    if ( preg_match('/\p{Mongolian}/u', $txtToDetect) )         { $highestLangCode = 'mn'; }
    if ( preg_match('/\p{Myanmar}/u', $txtToDetect) )           { $highestLangCode = 'my'; }
    if ( preg_match('/\p{Oriya}/u', $txtToDetect) )             { $highestLangCode = 'or'; }
    if ( preg_match('/\p{Sinhala}/u', $txtToDetect) )           { $highestLangCode = 'si'; }
    if ( preg_match('/\p{Tagalog}/u', $txtToDetect) )           { $highestLangCode = 'tl'; }
    if ( preg_match('/\p{Tamil}/u', $txtToDetect) )             { $highestLangCode = 'ta'; }
    if ( preg_match('/\p{Telugu}/u', $txtToDetect) )            { $highestLangCode = 'te'; }
    if ( preg_match('/\p{Thaana}/u', $txtToDetect) )            { $highestLangCode = 'dv'; }
    if ( preg_match('/\p{Tibetan}/u', $txtToDetect) )           { $highestLangCode = 'bo'; }
    if ( preg_match('/[А-Яа-яЁё]/u', $txtToDetect) )            { $highestLangCode = 'ru'; }
        
    if ( $highestLangCode == '' ) {
        
        $wordsToTests = explode(strtolower($txtToDetect));
        $wordsToTests = preg_replace("/[:punct:]+/", "", $wordsToTests);
        
        foreach ( $wordsToTests as $wordsToTest ) {     
            
            // DATABASE WITH WORDS FROM LOTS OF LANGAGES            
            $uword = $mysqli->query("SELECT * FROM `langtable` WHERE `word` = '".clean($wordsToTest)."'; ");
            if ( $uword->num_rows > 0 ){ $highestLangCode = '-'; break; }
            
            // OR A SPELL CHECK LIKE pspell_check...
            
        } 
    
    }
    
    if ( $highestLangCode == '' ) { return false; } else { return true; }
    
}

如果你在數據庫或文件中有這些詞,你可以使用這個 function:

function CheckStringForWords($text){

$words = array("words","that","you","wanna","check");
//$words array can come out from a file or even database.



$matches = array();
$matchFound = preg_match_all(
                "/\b(" . implode($words,"|") . ")\b/i", 
                $text, 
                $matches
              );

if ($matchFound) {
  $words = array_unique($matches[0]);
  foreach($words as $word) {
    return true;
    //returns true if $text contains any of the words in $words array.
  }
  return false;
  //returns false if $text does not contain any of the words in $words array.
}
}

如果你使用的是 php 8 那么有一個 function str_contains() 判斷一個字符串是否包含給定的 substring

$string = 'The lazy fox jumped over the fence';

if (str_contains($string, 'lazy')) {
    echo "The string 'lazy' was found in the string\n";
}

if (str_contains($string, 'Lazy')) {
    echo 'The string "Lazy" was found in the string';
} else {
    echo '"Lazy" was not found because the case does not match';
}

確定字符串是否包含給定的 substring

上面的例子將 output:

The string 'lazy' was found in the string
"Lazy" was not found because the case does not match

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM