简体   繁体   English

如何优化我的 PHP 脚本以从雅虎获取日语句子的语音阅读? 日本API?

[英]How I can optimize my PHP script to get phonetical readings of Japanese sentences from Yahoo! Japan API?

I wrote a PHP script which reads Japanese sentences from file, get the phonetical reading of each sentence using Yahoo.我写了一个 PHP 脚本,它从文件中读取日语句子,使用 Yahoo 获取每个句子的语音阅读。 Japan API and writes them to output file, But the script is incredibly slow, it has processed only 50.000 sentences in the last 12 hours on the Apache running on my Mac OS X?日本 API 并将它们写入 output 文件,但是脚本非常慢,在过去 12 小时内它只处理了 50.000 句在我的 Mac OS X 上运行的 ZE9713AE04A02A810D4F33DD956F427 上Is the call to API the main bottleneck?对 API 的调用是主要瓶颈吗? How can I optimize it?我该如何优化它? Should I use a language other than PHP?我应该使用 PHP 以外的语言吗? Thanks!谢谢!

Here's how the first 4 lines of the input (examples-utf.utf) file look like:以下是输入 (examples-utf.utf) 文件的前 4 行的样子:

A: ムーリエルは20歳になりました。 Muiriel is 20 now.#ID=1282_4707
B: は 二十歳(はたち){20歳} になる[01]{になりました}
A: すぐに戻ります。 I will be back soon.#ID=1284_4709
B: 直ぐに{すぐに} 戻る{戻ります}

Here's the XML returned by API on the sentence "私は学生です": http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=YuLAPtSxg64LZ2dsAQnC334w1wGLxuq9cqp0MIGSO3QjZ1tbZCYaRRWkeRKdUCft7qej73DqEg--&grade=1&sentence=%E7%A7%81%E3%81%AF%E5%AD%A6%E7%94%9F%E3%81%A7%E3%81%99 Here's the XML returned by API on the sentence "私は学生です": http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=YuLAPtSxg64LZ2dsAQnC334w1wGLxuq9cqp0MIGSO3QjZ1tbZCYaRRWkeRKdUCft7qej73DqEg--&grade=1&sentence=%E7%A7%81%E3% 81%AF%E5%AD%A6%E7%94%9F%E3%81%A7%E3%81%99

My script follows:我的脚本如下:

<?php
    function getReading($wabun)
    {
        $res = "";
        $applicationID = "YuLAPtSxg64LZ2dsAQnC334w1wGLxuq9cqp0MIGSO3QjZ1tbZCYaRRWkeRKdUCft7qej73DqEg--";
        $grade = 1;
        $url = "http://jlp.yahooapis.jp/FuriganaService/V1/furigana?appid=".$applicationID."&grade=".$grade."&sentence=".$wabun;    
        $doc = new DOMDocument();
        $doc->load($url);
        foreach ($doc->getElementsByTagName('Word') as $node) {
            $surface = $node->getElementsByTagName('Surface')->item(0)->nodeValue;
            $furigana = $node->getElementsByTagName('Furigana')->item(0)->nodeValue;
            $reading = (isset($furigana)) ? $furigana : $surface;
            $res .= $reading;
        }
        return $res;
    }
?>
<?php
    header('Content-Type: text/html;charset=utf-8');    
    $myFile = "examples-utf.utf";
    $outFile = "examples-output.utf";
    $file = fopen($myFile, 'r') or die("can't open read file");
    $out = fopen($outFile, 'w') or die("can't open write file");
    $i = 1; // line number
    $start = 3; // beginning of japanese sentence, after "A: "
    while($line = fgets($file))
    {
        // line starts at "A: "
        if($i&1)
        {
            $pos = strpos($line, "\t");
            $japanese = substr($line, $start, $pos - $start);

            $end = strpos($line, "#ID=", $pos + 1);
            $english = substr($line, $pos + 1, $end - $pos - 1);
            $reading = getReading($japanese);

            fwrite($out, $japanese."\n");
            fwrite($out, $english."\n");
            fwrite($out, $reading."\n");

        }
        ++$i;
    }
    fclose($out);
?>

From where I am (Berlin/Germany) the site jlp.yahooapis.jp has a ping latency of about 500ms, so it lasts nearly 7h just to do the 50.000 pings.从我所在的地方(柏林/德国)来看,站点 jlp.yahooapis.jp 的 ping 延迟约为 500 毫秒,因此仅执行 50.000 次 ping 就持续了近 7 小时。 Not to mention the data processing on the Yahoo-Server.更不用说雅虎服务器上的数据处理了。 So yes, I think the main bottleneck is using an webservice on another server.所以是的,我认为主要瓶颈是在另一台服务器上使用 web 服务。

I'm not sure which was the reason of this issue, but the latest version of the Yahoo: APIs is pretty smooth (Endpoint: https://jlp.yahooapis.jp/FuriganaService/V1/furigana )我不确定这个问题的原因是什么,但最新版本的 Yahoo: APIs 非常流畅(端点: https://jlp.yahooapis.jp/FuriganaService/V1/furigana

I have posted a similar question here:我在这里发布了一个类似的问题:

How to use the Yahoo! 如何使用雅虎! JAPAN Japanese Language Processing API JAPAN 日语处理 API

If this is a batch process, you could try running several of your scripts concurrently on separate lists.如果这是一个批处理过程,您可以尝试在单独的列表中同时运行多个脚本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM