简体   繁体   English

如何处理输出到XML的HTML / php格式的所有特殊字符

[英]how to handle all special characters in HTML/php form that outputs to XML

I have a little php/mysql app I put together that takes an input form and stores it in a MySQL database, and outputs the data as XML for consumption by a radio-playing hardware device. 我放在一起放了一个小php / mysql应用程序,它采用输入形式并将其存储在MySQL数据库中,并将数据输出为XML,供无线电播放硬件设备使用。

The problem is ampersands and other characters. 问题是“&”号和其他字符。 The user is taking descriptions of various radio stations, along with streaming URL or Playlist URL and pasting them into the form. 用户正在获取各种广播电台的描述以及流URL或播放列表URL,并将它们粘贴到表单中。 Some radio stations are in non-english speaking countries (mostly French). 一些广播电台在非英语国家(大多数是法国)。 I need to know what to do to preprocess these fields so that the XML that is generated is not corrupted, which breaks the external hardware app. 我需要知道如何对这些字段进行预处理,以使生成的XML不会损坏,这会破坏外部硬件应用程序。

I assume that this should go into the php that is called when the form is submitted. 我认为这应该提交表单提交时调用的php。 I'm pretty sure the htmlspecialchars function should be used, but I'm not sure the best method, since I've hacked this together from a variety of sources: 我很确定应该使用htmlspecialchars函数,但是我不确定最好的方法,因为我已经从各种来源将其一起破解了:

UPDATE: Here is my current output code with some regex that cleans up the ampersands. 更新:这是我当前的输出代码,带有一些正则表达式,用于清理&符号。

<?
include("HLN/manager/connect.php");

$query = "SELECT * FROM hln_stations ORDER BY orderid ASC";
$result = mysql_query($query);

$num = mysql_num_rows ($result);
mysql_close();

$xml = new XMLWriter();

$xml->openURI("php://output");
$xml->startDocument();
header('Content-type: text/xml');
$xml->setIndent(true);

$xml->startElement('channels');

while ($row = mysql_fetch_assoc($result)) {

  $xml->startElement("channel");
     $xml->startElement("title");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['station_title']));
     $xml->endElement();
     $xml->startElement("descriptionline1");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['station_display_name']));
     $xml->endElement();

     $xml->startElement("descriptionline2");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['station_subtitle']));
     $xml->endElement();

     $xml->startElement("description");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['station_detailed_description']));
     $xml->endElement();

     $xml->startElement("sdimage");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['sdtv_thumbnail_graphic_url']));
     $xml->endElement();

     $xml->startElement("hdimage");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['hdtv_thumbnail_graphic_url']));
     $xml->endElement();

     $xml->startElement("uri");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['stream_url_or_playlist_url']));
     $xml->endElement();

     $xml->startElement("linktype");
          $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;',$row['link_type']));
     $xml->endElement();

 $xml->endElement();
}

$xml->endElement();


$xml->flush();

?>

But I still need to solve the French character set issues that are cropping up. 但是我仍然需要解决正在出现的法语字符集问题。 How can I replace the é character for example with something that doesn't cause problems? 例如,如何用不会引起问题的东西替换é字符?

You've an error in Firefox, that says not well formed, because the character set detected doesn't match the character set you output. 您在Firefox中遇到错误,提示格式不正确,因为检测到的字符集与您输出的字符集不匹配。 I tried various combinations of character sets and could reproduce the issue. 我尝试了各种字符集组合,并可能重现此问题。

You've to specify explicitly your character sets, such as: 您必须明确指定字符集,例如:

header('Content-type: text/xml; charset=UTF-8');
$xml = new XMLWriter();
$xml->openURI("php://output");
$xml->startDocument("1.0", "UTF-8");

If specifying character set as UTF-8 in the content type and in XML gives you error, it means that your input is not valid UTF-8, try with ISO-8859-15 instead, or recode your input. 如果在内容类型中将字符集指定为UTF-8并使用XML会给您带来错误,则意味着您的输入不是有效的UTF-8,请改用ISO-8859-15,或重新编码您的输入。

You have to put the content-type charset header for every page of your site, including the form to input data or your special characters could be messed up. 您必须在网站的每个页面上放置内容类型字符集标题,包括用于输入数据的表格,否则您的特殊字符可能会被弄乱。 Further you've to connect to mysql specifying the character set that you want to use for the connection and that should match the charset and collation of your tables. 此外,您必须连接到mysql,以指定要用于连接的字符集,该字符集应与表的字符集和排序规则匹配。

Supposing that you're using UTF-8 look at your database with PHPMyAdmin and a UTF-8 connection, if you can't see your special characters well it means you're doing something wrong. 假设您使用的是UTF-8,则使用PHPMyAdmin和UTF-8连接查看数据库,如果看不到特殊字符,则表示您做错了。

As for the device, if you say that it can display only ASCII characters, does it do the conversion for you when you give UTF-8 input or do you have to give the entity such as: 对于设备,如果您说它只能显示ASCII字符,那么当您输入UTF-8输入时,它会为您执行转换吗?还是必须提供诸如以下这样的实体:

Ch&#xE9;rie 

If those two options doesn't work, you may want to convert to ASCII, such as "Cherie"... but that would be the last resort. 如果这两个选项不起作用,则可能要转换为ASCII,例如“ Cherie” ...,但这将是最后的选择。


Proof of concept code without using the DB: 不使用DB的概念证明代码:

<?php

header('Content-type: text/xml; charset=UTF-8');

$radioArr = array(
   array("Chérie FM @Work", "http://www.listenlive.eu/cheriefm_atwork.m3u?p&test"), 
   array("Hélène FM", "http://broadcast.infomaniak.ch/helenefm-high.mp3.m3u")
);
$xml = new XMLWriter();
$xml->openURI("php://output");
$xml->startDocument("1.0", "UTF-8");
$xml->setIndent(true);
$xml->startElement('channels');
foreach ($radioArr AS $radio) {
     $xml->startElement("channel");

     $xml->startElement("title");
     $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;', $radio[0]));
     $xml->endElement();

     $xml->startElement("uri");
     $xml->writeRaw(preg_replace('/&(?![A-Za-z0-9#]{1,7};)/','&amp;', $radio[1]));
     $xml->endElement();

     $xml->endElement(); //end channel
}

$xml->endElement();
$xml->flush();

?>

If you want to really "clean up french characters" (remove) 如果您要真正“清理法语字符”(删除)

What about doing this ( iconv ) ? 怎么做( iconv )?

iconv('utf8', 'ascii//TRANSLIT', $text);

Wrapped the data using CDATA. 使用CDATA包装数据。 Instead of writeRaw() use writeCData() Please refer to the sample below. 相反writeRaw()使用writeCData()请参见下面的示例。

// CData output
$xml->startElement('title');
$xml->writeCData($row['station_subtitle']);
$xml->endElement();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM