DOMDocument and UTF-8 Problem

A few weeks back I shared how I used PHP DOMDocument to reliably update all image URLs from standard HTTP to HTTPS. DOMDocument made a difficult problem seem incredibly easy … but with one side-effect that it took me a while to spot: UTF-8 characters were being mutated into another set of characters. I was seeing a bunch of odd characters like “ãç³” and”»ã®é” all over each blog post.

I knew the problem was happening during the DOMDocument parsing and that I need to find a fix quickly. The solution was just a tiny bit of code:

// Create a DOMDocument instance 
$doc = new DOMDocument();

// The fix: mb_convert_encoding conversion
$doc->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));

After setting the character set with mb_convert_encoding, the odd characters vanished and the desired characters were back in place. Phew!

The post DOMDocument and UTF-8 Problem appeared first on David Walsh Blog.

Source: http://davidwalsh.name/feed

Pyntax

DOMDocument and UTF-8 Problem

ByDavid Walsh

By David Walsh

Related Post

Drupal 8 Queue API – Powerful Manual and Cron Queueing

Convert Image to ASCII Art with Node.js

Easy Continuous Delivery with ContinuousPHP and Zend Server

You missed

I hate installing apps to save money, but this Pixel privacy feature makes it worthwhile

Teslas made in Texas will likely have to leave the state before Texans can buy them

MagSafe used to fish out iPhone 12 Pro dropped in canal

Wacom Cintiq Pro 24 Touch review: Beautiful but needs improvement

Pyntax