I am parsing a text file and am occassionally running into data such as:
CASTA¥EDA, JASON
Using a Mongo DB backend when I try saving information, I am getting errors like:
[MongoDB\Driver\Exception\UnexpectedValueException]
Got invalid UTF-8 value serializing 'Jason Casta�eda'
After Googling a few places, I located two functions that the author says would work:
function is_utf8( $str )
{
return preg_match( "/^(
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$/x",
$str
);
}
public function force_utf8($str, $inputEnc='WINDOWS-1252')
{
if ( $this->is_utf8( $str ) ) // Nothing to do.
return $str;
if ( strtoupper( $inputEnc ) === 'ISO-8859-1' )
return utf8_encode( $str );
if ( function_exists( 'mb_convert_encoding' ) )
return mb_convert_encoding( $str, 'UTF-8', $inputEnc );
if ( function_exists( 'iconv' ) )
return iconv( $inputEnc, 'UTF-8', $str );
// You could also just return the original string.
trigger_error(
'Cannot convert string to UTF-8 in file '
. __FILE__ . ', line ' . __LINE__ . '!',
E_USER_ERROR
);
}
Using the two functions above I am trying to determine if a line of text has UTF-8 by calling is_utf8($text) and if it is not then I call the force_utf8($text) function. However I am getting the same error. Any pointers?
This question is pretty old, but for those who face same issue and get on this page like me:
mb_convert_encoding($value, 'UTF-8', 'UTF-8');
This code should replace all non UTF-8 characters by ?
symbol and it will be safe for MongoDB insert/update operations.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.