简体   繁体   中英

Writing UTF-8 to BLOB in MariaDB and in MySQL using Hibernate 4

I have a database with table like:

CREATE DATABASE `test_db` /*!40100 DEFAULT CHARACTER SET utf8 */;

CREATE TABLE `atable` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `shortText` varchar(255) DEFAULT NULL,
  `longText` blob,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

existing on MySQL (5.7.18-0ubuntu0.16.04.1) and MariaDB (10.1.23-MariaDB) servers. I'm writing there a UTF-8 data from my Java app with Hibernate. Entity object looks like:

@Entity(name = "atable")
public class AClass{
  @Id
  @Column(name = "id", unique = true)
  @GeneratedValue
  Long id;
  @Column
  private String shortText; //also exists setter and getter, of course
  private byte[] longText;    
  public void setLongText(String s){this.longText = (s!=null)?s.getBytes():null;}
  public String getLongText(){return this.longText!=null?new String(longText):null;}
}

For both databases I'm using JDBC-connection url:

jdbc:mysql://localhost:3306/app_db?useUnicode=true&characterEncoding=utf8

And when I'm writing UTF-8 data to MySQL, it works fine.

But when I'm writing it to MariaDB, it stores UTF-8 only to varchar , but to blob it writes ???? instead my data. Even request: select hex(longText) from atable where id=0; shows that MariaDB writes there symbols with code 3F instead my letters.

What is worng and what can I do with it?

s.getBytes() is not guaranteed to encode text as UTF-8. new String(longText) is not guaranteed to decode bytes as UTF-8.

Both of those methods use the system's default charset, which is not UTF-8 on Windows systems.

To guarantee correct operation, specify the charset:

s.getBytes(StandardCharsets.UTF_8)
new String(longText, StandardCharsets.UTF_8)

Multiple "question marks", and their causes, are discussed in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored

If you have more issues that than covers, provide the connection parameters, and other thing mentioned.

For Emoji and Chinese, you need the columns/tables to be utf8mb4 , not just utf8 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM