简体   繁体   English

使用regex php匹配文件中的单词

[英]Match words in file with regex php

I'm new with regex and php. 我是regex和php的新手。 I know this quite simple but i just can't get it. 我知道这很简单,但我无法理解。 Now, i have file words.txt that contain: 现在,我有包含以下内容的words.txt文件:

happy
sad
laugh

I want to find match this sentence with my words.txt: 我想找到与我的words.txt匹配的这句话:

I am happy 我很开心

So far, i've tried this but it doesn't valid because it read as a sentence not words: (not yet implement regex bcs im confused) 到目前为止,我已经尝试过了,但是它无效,因为它读为句子而不是单词:(尚未实现regex bcs im混淆)

$input0= "I am happy";
$handle = fopen('words.txt', 'r');
$valid = false; 
while (($buffer = fgets($handle)) !== false) {
if (strpos($buffer, $input0) !== false) { // here's the problem
    $valid = TRUE;
    break;
   }      
}
if($valid == TRUE){
//print the matches word
}
fclose($handle);

can u help me? 你能帮我吗? :( :(

Depending on your final goal you may not even need regexp here, since you want to match entire word with no variable part. 根据您的最终目标,您可能甚至不需要在这里使用正则表达式,因为您希望匹配不带可变部分的整个单词。

if you want to have a loop on your keywords a simple str_replace() would do the job to replace the word by an emphasize one for instance, or simple if (strpos($input0, $word) !== false) to just check if found in sentence and find position. 如果您想在关键字上循环,那么一个简单的str_replace()便可以用一个强调单词替换该单词,或者简单地通过if (strpos($input0, $word) !== false)来检查该if (strpos($input0, $word) !== false)如果在句子中找到位置。

But if you want to avoid a loop, for faster results and especially if you have many words preg_match_all() will do what you need as said by Zanderwar. 但是,如果您想避免循环,则可以更快地获得结果 ,尤其是当您有很多单词时,如Zanderwar所说, preg_match_all()可以满足您的需求。 Here is an example: 这是一个例子:

$input0= "I am happy but sometimes quite pretty sad. It depends but I prefer to be happy in general.\nMy paragraph also continue on multilines\nend it makes me laugh and rejoy. I am so happy. HAPPY?";
// $contents = file_get_contents('words.txt');
$contents = "happy\nsad\nlaugh";

$words_list = str_replace("\n", '|', $contents);

if (preg_match_all("~($words_list)~si", $input0, $matches))
{
    print_r(array($matches));
    // Do what you want
}

The i flag match case insensitive if you need. 如果需要, i标志不区分大小写。

The s flag match on multilines content. s标志在多行内容上匹配。

[EDIT] to add more details on regexp [EDIT]添加有关regexp的更多详细信息

In the pattern you need a delimiter which can be ~ because it is very seldom used in sentences and strings to match so you wont need to escape / as when you use / delimiter. 在模式中,您需要一个定界符,该定界符可以是~因为很少在句子和字符串中使用它来匹配,因此您无需像使用/分隔符那样转义/

also I am joining your words like ~(sad|joy|happy)~ if you want to capture the words. 如果您想捕获单词,我也会加入您的单词,例如~(sad|joy|happy)~ if you don't you need a group like (?:sad|joy|happy) 如果您不这样做,则需要一个类似(?:sad|joy|happy)的团体

the | | means or. 意味着或。

You can try to replace regex ~($words_list)~si by ~(?:$words_list)~si if you dont need capturing - and you don't - you will then have only one level of captures in $matches array, at position [0] it is always the full match. 如果不需要捕获,则可以尝试用~(?:$words_list)~si ~($words_list)~si替换正则表达式~($words_list)~si ~(?:$words_list)~si然后,您将在$ matches数组中只有一层捕获,即位置[0]始终是完整匹配。 but here you don't have more complex patterns to match and so no need to capture 但是这里您没有更复杂的模式可以匹配,因此无需捕获

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM