简体   繁体   中英

Saving special characters to DB then display using PHP

I have a script which caches a number of RSS feeds, however I have noticed that I've started getting strange characters appearing in the page where I output the cached contents (Stored in DB).

For instance the RSS feed contains the characters: Introducing…: ...

Which should read: Introducing...: ...

However my page displays it as: Introducing…: ...

It seems that these strangers chars are actually being stored in the database like this.

Can anyone suggest where I might be going wrong?

Do I need to encode on the way into the database the decode on the way out?

You need to make sure that the encoding of the RSS feed is the same as in your DB. Otherwise you first need to convert the content.

The encoding of the feed should be in the XML header:

<?xml version="1.0" encoding="UTF-8"?>

You can use this function to convert it to the encoding you use in the DB (preferably UTF-8):

http://php.net/manual/function.mb-convert-encoding.php

The fact that there are 3 bad characters in the output suggests that the RSS feed is being interpreted so that the HTML character reference is converted to UTF-8.

Try setting the text encoding of your display page to UTF-8 by adding the following to the output HTML in the <head> section:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Alternatively, since this is PHP you can set the HTTP header directly:

<?php
header("Content-Type: text/html; charset=UTF-8");
?>

However, a better solution might be to avoid converting the entity in the first place. Have you got a call to html_entity_decode() in the code that retrieves the RSS feed? If so, then it might be wise to remove it.

When you use UTF-8 then make sure you set the database connection to utf-8.. fe in mysql

SET NAMES 'utf-8';

Then set the correct output content-type like described by Anthony Williams. At best you do both: set the META Content-Type and send the Content-Type HTTP-Header.

Since your application seems to decode the htmlentities of that cached RSS feed before writing them to the DB, you may also output them like you got them in the first place

<?php echo htmlentities($string, ENT_QUOTES, 'UTF-8'); ?>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM