簡體   English   中英

MYSQL 到 Elasticsearch 通過 Logstash 問題:不兼容的編碼:CP850 和 UTF-8

[英]MYSQL to Elasticsearch via Logstash Problem: incompatible encodings: CP850 and UTF-8

我正在使用帶有 ES 版本 8.4.0 的 docker-compose 的麋鹿堆棧

我的目標是使用 Logstash 將整個表從我的 MYSQL DB 復制到 ES。 連接正常,Logstash 復制了大約 30 個條目,沒有任何問題。 但后來我收到一條很長的錯誤消息。

[2022-09-10T18:41:26,318][ERROR][logstash.outputs.elasticsearch][main][757e3825fce0788f949869472d03e028630de9d063200717b56bc9ceefe29d81] An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings : CP850 和 UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in write'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in block in bulk'", "org/jruby/RubyArray. java:1865:in each'", "org/jruby/RubyEnumerable.java:1143:in each_with_index'", "D:/logstash/vendor/bundle/11/.6logstash/vendor/bundle/11/.6logstash/2-輸出。 .0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in safe_bulk'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/ lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in '", "D:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/ logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "D:/logstash/logstash-core/lib/logstash /java_pipeline.rb:300:in `block in start_workers'"]}

我懷疑這個錯誤是原因: {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError

我的配置文件如下所示:

  jdbc {
    clean_run => true
    jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar" 
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/posts" 
    jdbc_user => "sqluser"
    jdbc_password => "sqlpassword"
    schedule => "* * * * *" 
    statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added 
    FROM telegram.channel_results where id >:sql_last_value;"
    use_column_value => true
    tracking_column => "id"
    
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "posts"
    user => "username"
    password => "password"
  }

  stdout {
    codec => rubydebug
  }
}

我注意到,如果我從查詢中刪除了文本列,則該過程運行沒有任何問題。 在我的數據庫中,文本列是 SQL 類型的文本。 我懷疑存在編碼問題,因為其中還包含俄語文本和表情。 我需要一個解決方案來復制 ES 中的文本。 也許這是文本中表情和其他字符的編碼問題?!

在字符集編碼輸入過濾器下面試試這個。

 jdbc {
    clean_run => true
    jdbc_driver_library => "D:\logstash\mysql-connector-java-8.0.30.jar" 
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/posts" 
    jdbc_user => "sqluser"
    jdbc_password => "sqlpassword"
    schedule => "* * * * *" 
    statement => "SELECT id, id_post, url, id_subforum, author, text, spread, date, added 
    FROM telegram.channel_results where id >:sql_last_value;"
    use_column_value => true
    tracking_column => "id"
    columns_charset => {
            "text" => "ISO-8859-5"
    }
  }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM