计算(doc txt docx)文件中的单词数

[英]Count number of words from (doc txt docx ) files

I'm trying to count number of words in file. 我正在尝试计算文件中的单词数。 The following code is working fine with .txt file. 下面的代码可以与.txt文件正常工作。 But When I try to read .doc docx .xls files. 但是,当我尝试读取.doc docx .xls文件时。 Its give me wrong output. 它给我错误的输出。 Please suggest me any third party plugin. 请建议我任何第三方插件。 Please help me . 请帮我 。 thanks 谢谢

$str = file_get_contents($path);

function count_words($string)
    $string = htmlspecialchars_decode(strip_tags($string));
    if (strlen($string)==0)
        return 0;
    $t = array(' '=>1, '_'=>1, "\x20"=>1, "\xA0"=>1, "\x0A"=>1, "\x0D"=>1, "\x09"=>1, "\x0B"=>1, "\x2E"=>1, "\t"=>1, '='=>1, '+'=>1, '-'=>1, '*'=>1, '/'=>1, '\\'=>1, ','=>1, '.'=>1, ';'=>1, ':'=>1, '"'=>1, '\''=>1, '['=>1, ']'=>1, '{'=>1, '}'=>1, '('=>1, ')'=>1, '<'=>1, '>'=>1, '&'=>1, '%'=>1, '$'=>1, '@'=>1, '#'=>1, '^'=>1, '!'=>1, '?'=>1); // separators
    $count= isset($t[$string[0]])? 0:1;
    if (strlen($string)==1)
        return $count;
    for ($i=1;$i<strlen($string);$i++)
        if (isset($t[$string[$i-1]]) && !isset($t[$string[$i]])) // if new word starts
    return $count;
    echo count_words($str);


system("wc -w " . $filename); 

I am working in the same issues with you. 我正在和你一起处理同样的问题。 All you need to do is parse the .doc docx .xls file in the right way. 您所需要做的就是以正确的方式解析.doc docx .xls文件。 Then use the count_words 然后使用count_words

private function read_docx(){

    $striped_content = '';
    $content = '';

    $zip = zip_open($this->filename);

    if (!$zip || is_numeric($zip)) return false;

    while ($zip_entry = zip_read($zip)) {

        if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

        if (zip_entry_name($zip_entry) != "word/document.xml") continue;

        $content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

    }// end while


    $content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
    $content = str_replace('</w:r></w:p>', "\r\n", $content);
    $striped_content = strip_tags($content);

    return $striped_content;

