Welcome! » Log In » Create A New Profile

  Converted To Â

Posted by laurin1 
  Converted To Â
March 13, 2012 12:40PM

I've read numerous posts on different sites about this problem, but I am at a loss on what to do about it. I found one method to fix it, that is run all of my code through html_decode first. That works, but as you can guess, breaks other things.

Re:   Converted To Â
March 13, 2012 01:00PM
Re:   Converted To Â
March 13, 2012 01:05PM

Yea, I get that, but you are talking about a massive project to convert and test a project that is upwards of 300,000 lines of code.

Ain't gonna happen anytime soon.

So, I did this and it works:

73      /**
74       * @static
75       * @param string $sHTML
76       * @return string
77       */
78      private static function getHTMLReplacingNBSPWithEntityNumber($sHTML){
80           return (string) str_ireplace(" ", html_entity_decode(" "), $sHTML);
82      }
Re:   Converted To Â
March 13, 2012 03:00PM

No, you did not read the document. It gives a solution for your case.

Re:   Converted To Â
September 14, 2017 04:28PM

I don't see how the getHTMLReplacingNBSPWithEntityNumber function is solving the problem but then I don't know where laurin1 is using it in the process. I was almost about to do something like that to solve my problem but found it unnecessary. I saw short answers to this problem of using %Core.Encoding or switch to UTF-8. Also this link http://htmlpurifier.org/docs/enduser-utf8.html provided by Ambush Commander describes the process in which an html entity like θ gets lost in the process. This led me to believe that the same would hold true for   and I believe that is true for plain ASCII encoding.

In my case; however, the original encoding was in Windows-1252 which I found out does have a character for a non-breaking space (hex A0) which acts just like an   in a browser that knows the encoding is Windows 1252. So you don't need to use %Core.Encoding or switch to UTF-8. If the browser thinks the encoding is UTF-8 but it's Windows 1252 then you will get garbage like Â. http://htmlpurifier.org/docs/enduser-utf8.html does a pretty thorough job of explaining what controls this. Just take note that PHP has a default coding of UTF-8 and this overrides the META tags.

Your Email:


HTML input is enabled. Make sure you escape all HTML and angled brackets with < and >.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

Place code here

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}