Welcome! » Log In » Create A New Profile

  Converted To Â

Posted by laurin1 
  Converted To Â
March 13, 2012 12:40PM

I've read numerous posts on different sites about this problem, but I am at a loss on what to do about it. I found one method to fix it, that is run all of my code through html_decode first. That works, but as you can guess, breaks other things.

Re:   Converted To Â
March 13, 2012 01:00PM
Re:   Converted To Â
March 13, 2012 01:05PM

Yea, I get that, but you are talking about a massive project to convert and test a project that is upwards of 300,000 lines of code.

Ain't gonna happen anytime soon.

So, I did this and it works:

73      /**
74       * @static
75       * @param string $sHTML
76       * @return string
77       */
78      private static function getHTMLReplacingNBSPWithEntityNumber($sHTML){
80           return (string) str_ireplace(" ", html_entity_decode(" "), $sHTML);
82      }
Re:   Converted To Â
March 13, 2012 03:00PM

No, you did not read the document. It gives a solution for your case.

Re:   Converted To Â
September 14, 2017 04:28PM

I don't see how the getHTMLReplacingNBSPWithEntityNumber function is solving the problem but then I don't know where laurin1 is using it in the process. I was almost about to do something like that to solve my problem but found it unnecessary. I saw short answers to this problem of using %Core.Encoding or switch to UTF-8. Also this link http://htmlpurifier.org/docs/enduser-utf8.html provided by Ambush Commander describes the process in which an html entity like θ gets lost in the process. This led me to believe that the same would hold true for   and I believe that is true for plain ASCII encoding.

In my case; however, the original encoding was in Windows-1252 which I found out does have a character for a non-breaking space (hex A0) which acts just like an   in a browser that knows the encoding is Windows 1252. So you don't need to use %Core.Encoding or switch to UTF-8. If the browser thinks the encoding is UTF-8 but it's Windows 1252 then you will get garbage like Â. http://htmlpurifier.org/docs/enduser-utf8.html does a pretty thorough job of explaining what controls this. Just take note that PHP has a default coding of UTF-8 and this overrides the META tags.

Sorry, you do not have permission to post/reply in this forum.