Static Public Member Functions | |
| static | cleanUTF8 ($str, $force_php=false) |
| Cleans a UTF-8 string for well-formedness and SGML validity. | |
| static | unichr ($code) |
| Translates a Unicode codepoint into its corresponding UTF-8 character. | |
| static | convertToUTF8 ($str, $config, $context) |
| Converts a string to UTF-8 based on configuration. | |
| static | convertFromUTF8 ($str, $config, $context) |
| Converts a string from UTF-8 based on configuration. | |
| static | convertToASCIIDumbLossless ($str) |
| Lossless (character-wise) conversion of HTML to ASCII. | |
| static | testEncodingSupportsASCII ($encoding, $bypass=false) |
| This expensive function tests whether or not a given character encoding supports ASCII. | |
| static | cleanUTF8 ($str, $force_php=false) |
| Cleans a UTF-8 string for well-formedness and SGML validity. | |
| static | unichr ($code) |
| Translates a Unicode codepoint into its corresponding UTF-8 character. | |
| static | convertToUTF8 ($str, $config, $context) |
| Converts a string to UTF-8 based on configuration. | |
| static | convertFromUTF8 ($str, $config, $context) |
| Converts a string from UTF-8 based on configuration. | |
| static | convertToASCIIDumbLossless ($str) |
| Lossless (character-wise) conversion of HTML to ASCII. | |
| static | testEncodingSupportsASCII ($encoding, $bypass=false) |
| This expensive function tests whether or not a given character encoding supports ASCII. | |
Private Member Functions | |
| __construct () | |
| Constructor throws fatal error if you attempt to instantiate class. | |
| __construct () | |
| Constructor throws fatal error if you attempt to instantiate class. | |
Static Private Member Functions | |
| static | muteErrorHandler () |
| Error-handler that mutes errors, alternative to shut-up operator. | |
| static | muteErrorHandler () |
| Error-handler that mutes errors, alternative to shut-up operator. | |
Definition at line 7 of file Encoder.php.
| HTMLPurifier_Encoder::__construct | ( | ) | [private] |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 13 of file Encoder.php.
| HTMLPurifier_Encoder::__construct | ( | ) | [private] |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 2696 of file HTMLPurifier.standalone.php.
| static HTMLPurifier_Encoder::muteErrorHandler | ( | ) | [static, private] |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 20 of file Encoder.php.
Referenced by convertFromUTF8(), convertToUTF8(), and testEncodingSupportsASCII().
| static HTMLPurifier_Encoder::cleanUTF8 | ( | $ | str, | |
| $ | force_php = false | |||
| ) | [static] |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Fallback code adapted from utf8ToUnicode by Henri Sivonen and hsivonen@iki.fi at <http://iki.fi/hsivonen/php-utf8/> under the LGPL license. Notes on what changed are inside, but in general, the original code transformed UTF-8 text into an array of integer Unicode codepoints. Understandably, transforming that back to a string would be somewhat expensive, so the function was modded to directly operate on the string. However, this discourages code reuse, and the logic enumerated here would be useful for any function that needs to be able to understand UTF-8 characters. As of right now, only smart lossless character encoding converters would need that, and I'm probably not going to implement them. Once again, PHP 6 should solve all our problems.
Definition at line 47 of file Encoder.php.
Referenced by HTMLPurifier_Printer::escape(), HTMLPurifier_Lexer::normalize(), and HTMLPurifier_AttrDef_CSS_FontFamily::validate().
| static HTMLPurifier_Encoder::unichr | ( | $ | code | ) | [static] |
Translates a Unicode codepoint into its corresponding UTF-8 character.
While we're going to do code point parsing anyway, a good optimization would be to refuse to translate code points that are non-SGML characters. However, this could lead to duplication.
This is very similar to the unichr function in maintenance/generate-entity-file.php (although this is superior, due to its sanity checks).
Definition at line 226 of file Encoder.php.
Referenced by HTMLPurifier_EntityParser::nonSpecialEntityCallback(), and HTMLPurifier_AttrDef_CSS_FontFamily::validate().
| static HTMLPurifier_Encoder::convertToUTF8 | ( | $ | str, | |
| $ | config, | |||
| $ | context | |||
| ) | [static] |
Converts a string to UTF-8 based on configuration.
Definition at line 266 of file Encoder.php.
References muteErrorHandler(), and testEncodingSupportsASCII().
Referenced by HTMLPurifier::purify().
| static HTMLPurifier_Encoder::convertFromUTF8 | ( | $ | str, | |
| $ | config, | |||
| $ | context | |||
| ) | [static] |
Converts a string from UTF-8 based on configuration.
Definition at line 293 of file Encoder.php.
References convertToASCIIDumbLossless(), muteErrorHandler(), and testEncodingSupportsASCII().
Referenced by HTMLPurifier::purify().
| static HTMLPurifier_Encoder::convertToASCIIDumbLossless | ( | $ | str | ) | [static] |
Lossless (character-wise) conversion of HTML to ASCII.
| $str | UTF-8 string to be converted to ASCII |
This is a DUMB function: it has no concept of keeping character entities that the projected character encoding can allow. We could possibly implement a smart version but that would require it to also know which Unicode codepoints the charset supported (not an easy task).
Sort of with cleanUTF8() but it assumes that $str is well-formed UTF-8
Definition at line 339 of file Encoder.php.
Referenced by convertFromUTF8().
| static HTMLPurifier_Encoder::testEncodingSupportsASCII | ( | $ | encoding, | |
| $ | bypass = false | |||
| ) | [static] |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
| string | $encoding Encoding name to test, as per iconv format | |
| bool | $bypass Whether or not to bypass the precompiled arrays. |
Definition at line 381 of file Encoder.php.
References muteErrorHandler().
Referenced by convertFromUTF8(), and convertToUTF8().
| static HTMLPurifier_Encoder::muteErrorHandler | ( | ) | [static, private] |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 2703 of file HTMLPurifier.standalone.php.
| static HTMLPurifier_Encoder::cleanUTF8 | ( | $ | str, | |
| $ | force_php = false | |||
| ) | [static] |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Fallback code adapted from utf8ToUnicode by Henri Sivonen and hsivonen@iki.fi at <http://iki.fi/hsivonen/php-utf8/> under the LGPL license. Notes on what changed are inside, but in general, the original code transformed UTF-8 text into an array of integer Unicode codepoints. Understandably, transforming that back to a string would be somewhat expensive, so the function was modded to directly operate on the string. However, this discourages code reuse, and the logic enumerated here would be useful for any function that needs to be able to understand UTF-8 characters. As of right now, only smart lossless character encoding converters would need that, and I'm probably not going to implement them. Once again, PHP 6 should solve all our problems.
Definition at line 2730 of file HTMLPurifier.standalone.php.
| static HTMLPurifier_Encoder::unichr | ( | $ | code | ) | [static] |
Translates a Unicode codepoint into its corresponding UTF-8 character.
While we're going to do code point parsing anyway, a good optimization would be to refuse to translate code points that are non-SGML characters. However, this could lead to duplication.
This is very similar to the unichr function in maintenance/generate-entity-file.php (although this is superior, due to its sanity checks).
Definition at line 2909 of file HTMLPurifier.standalone.php.
| static HTMLPurifier_Encoder::convertToUTF8 | ( | $ | str, | |
| $ | config, | |||
| $ | context | |||
| ) | [static] |
Converts a string to UTF-8 based on configuration.
Definition at line 2949 of file HTMLPurifier.standalone.php.
References muteErrorHandler(), and testEncodingSupportsASCII().
| static HTMLPurifier_Encoder::convertFromUTF8 | ( | $ | str, | |
| $ | config, | |||
| $ | context | |||
| ) | [static] |
Converts a string from UTF-8 based on configuration.
Definition at line 2976 of file HTMLPurifier.standalone.php.
References convertToASCIIDumbLossless(), muteErrorHandler(), and testEncodingSupportsASCII().
| static HTMLPurifier_Encoder::convertToASCIIDumbLossless | ( | $ | str | ) | [static] |
Lossless (character-wise) conversion of HTML to ASCII.
| $str | UTF-8 string to be converted to ASCII |
This is a DUMB function: it has no concept of keeping character entities that the projected character encoding can allow. We could possibly implement a smart version but that would require it to also know which Unicode codepoints the charset supported (not an easy task).
Sort of with cleanUTF8() but it assumes that $str is well-formed UTF-8
Definition at line 3022 of file HTMLPurifier.standalone.php.
| static HTMLPurifier_Encoder::testEncodingSupportsASCII | ( | $ | encoding, | |
| $ | bypass = false | |||
| ) | [static] |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
| string | $encoding Encoding name to test, as per iconv format | |
| bool | $bypass Whether or not to bypass the precompiled arrays. |
Definition at line 3064 of file HTMLPurifier.standalone.php.
References muteErrorHandler().
1.5.3