HTMLPurifier 4.4.0
|
A UTF-8 specific character encoder that handles cleaning and transforming. More...
Static Public Member Functions | |
static | muteErrorHandler () |
Error-handler that mutes errors, alternative to shut-up operator. | |
static | unsafeIconv ($in, $out, $text) |
iconv wrapper which mutes errors, but doesn't work around bugs. | |
static | iconv ($in, $out, $text, $max_chunk_size=8000) |
iconv wrapper which mutes errors and works around bugs. | |
static | cleanUTF8 ($str, $force_php=false) |
Cleans a UTF-8 string for well-formedness and SGML validity. | |
static | unichr ($code) |
Translates a Unicode codepoint into its corresponding UTF-8 character. | |
static | iconvAvailable () |
static | convertToUTF8 ($str, $config, $context) |
Converts a string to UTF-8 based on configuration. | |
static | convertFromUTF8 ($str, $config, $context) |
Converts a string from UTF-8 based on configuration. | |
static | convertToASCIIDumbLossless ($str) |
Lossless (character-wise) conversion of HTML to ASCII. | |
static | testIconvTruncateBug () |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. | |
static | testEncodingSupportsASCII ($encoding, $bypass=false) |
This expensive function tests whether or not a given character encoding supports ASCII. | |
static | muteErrorHandler () |
Error-handler that mutes errors, alternative to shut-up operator. | |
static | unsafeIconv ($in, $out, $text) |
iconv wrapper which mutes errors, but doesn't work around bugs. | |
static | iconv ($in, $out, $text, $max_chunk_size=8000) |
iconv wrapper which mutes errors and works around bugs. | |
static | cleanUTF8 ($str, $force_php=false) |
Cleans a UTF-8 string for well-formedness and SGML validity. | |
static | unichr ($code) |
Translates a Unicode codepoint into its corresponding UTF-8 character. | |
static | iconvAvailable () |
static | convertToUTF8 ($str, $config, $context) |
Converts a string to UTF-8 based on configuration. | |
static | convertFromUTF8 ($str, $config, $context) |
Converts a string from UTF-8 based on configuration. | |
static | convertToASCIIDumbLossless ($str) |
Lossless (character-wise) conversion of HTML to ASCII. | |
static | testIconvTruncateBug () |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly. | |
static | testEncodingSupportsASCII ($encoding, $bypass=false) |
This expensive function tests whether or not a given character encoding supports ASCII. | |
Public Attributes | |
const | ICONV_OK = 0 |
No bugs detected in iconv. | |
const | ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found. | |
const | ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes. | |
Private Member Functions | |
__construct () | |
Constructor throws fatal error if you attempt to instantiate class. | |
__construct () | |
Constructor throws fatal error if you attempt to instantiate class. |
A UTF-8 specific character encoder that handles cleaning and transforming.
Definition at line 7 of file Encoder.php.
HTMLPurifier_Encoder::__construct | ( | ) | [private] |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 13 of file Encoder.php.
HTMLPurifier_Encoder::__construct | ( | ) | [private] |
Constructor throws fatal error if you attempt to instantiate class.
Definition at line 3029 of file HTMLPurifier.standalone.php.
static HTMLPurifier_Encoder::cleanUTF8 | ( | $ | str, |
$ | force_php = false |
||
) | [static] |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Definition at line 109 of file Encoder.php.
Referenced by HTMLPurifier_Printer::escape(), HTMLPurifier_AttrDef::expandCSSEscape(), and HTMLPurifier_Lexer::normalize().
static HTMLPurifier_Encoder::cleanUTF8 | ( | $ | str, |
$ | force_php = false |
||
) | [static] |
Cleans a UTF-8 string for well-formedness and SGML validity.
It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.
Definition at line 3125 of file HTMLPurifier.standalone.php.
static HTMLPurifier_Encoder::convertFromUTF8 | ( | $ | str, |
$ | config, | ||
$ | context | ||
) | [static] |
Converts a string from UTF-8 based on configuration.
Definition at line 366 of file Encoder.php.
References $config, convertToASCIIDumbLossless(), iconv(), iconvAvailable(), and testEncodingSupportsASCII().
Referenced by HTMLPurifier::purify().
static HTMLPurifier_Encoder::convertFromUTF8 | ( | $ | str, |
$ | config, | ||
$ | context | ||
) | [static] |
Converts a string from UTF-8 based on configuration.
Definition at line 3382 of file HTMLPurifier.standalone.php.
References $config, convertToASCIIDumbLossless(), iconv(), iconvAvailable(), and testEncodingSupportsASCII().
static HTMLPurifier_Encoder::convertToASCIIDumbLossless | ( | $ | str | ) | [static] |
Lossless (character-wise) conversion of HTML to ASCII.
$str | UTF-8 string to be converted to ASCII |
Definition at line 413 of file Encoder.php.
Referenced by convertFromUTF8().
static HTMLPurifier_Encoder::convertToASCIIDumbLossless | ( | $ | str | ) | [static] |
Lossless (character-wise) conversion of HTML to ASCII.
$str | UTF-8 string to be converted to ASCII |
Definition at line 3429 of file HTMLPurifier.standalone.php.
static HTMLPurifier_Encoder::convertToUTF8 | ( | $ | str, |
$ | config, | ||
$ | context | ||
) | [static] |
Converts a string to UTF-8 based on configuration.
Definition at line 3352 of file HTMLPurifier.standalone.php.
References $config, iconvAvailable(), and unsafeIconv().
static HTMLPurifier_Encoder::convertToUTF8 | ( | $ | str, |
$ | config, | ||
$ | context | ||
) | [static] |
Converts a string to UTF-8 based on configuration.
Definition at line 336 of file Encoder.php.
References $config, iconvAvailable(), and unsafeIconv().
Referenced by HTMLPurifier::purify().
static HTMLPurifier_Encoder::iconv | ( | $ | in, |
$ | out, | ||
$ | text, | ||
$ | max_chunk_size = 8000 |
||
) | [static] |
iconv wrapper which mutes errors and works around bugs.
Definition at line 35 of file Encoder.php.
References testIconvTruncateBug(), and unsafeIconv().
Referenced by convertFromUTF8(), and unsafeIconv().
static HTMLPurifier_Encoder::iconv | ( | $ | in, |
$ | out, | ||
$ | text, | ||
$ | max_chunk_size = 8000 |
||
) | [static] |
iconv wrapper which mutes errors and works around bugs.
Definition at line 3051 of file HTMLPurifier.standalone.php.
References testIconvTruncateBug(), and unsafeIconv().
static HTMLPurifier_Encoder::iconvAvailable | ( | ) | [static] |
Definition at line 3341 of file HTMLPurifier.standalone.php.
References ICONV_UNUSABLE, and testIconvTruncateBug().
static HTMLPurifier_Encoder::iconvAvailable | ( | ) | [static] |
Definition at line 325 of file Encoder.php.
References ICONV_UNUSABLE, and testIconvTruncateBug().
Referenced by convertFromUTF8(), and convertToUTF8().
static HTMLPurifier_Encoder::muteErrorHandler | ( | ) | [static] |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 3036 of file HTMLPurifier.standalone.php.
static HTMLPurifier_Encoder::muteErrorHandler | ( | ) | [static] |
Error-handler that mutes errors, alternative to shut-up operator.
Definition at line 20 of file Encoder.php.
static HTMLPurifier_Encoder::testEncodingSupportsASCII | ( | $ | encoding, |
$ | bypass = false |
||
) | [static] |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
string | $encoding | Encoding name to test, as per iconv format |
bool | $bypass | Whether or not to bypass the precompiled arrays. |
Definition at line 498 of file Encoder.php.
References unsafeIconv().
Referenced by convertFromUTF8().
static HTMLPurifier_Encoder::testEncodingSupportsASCII | ( | $ | encoding, |
$ | bypass = false |
||
) | [static] |
This expensive function tests whether or not a given character encoding supports ASCII.
7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.
string | $encoding | Encoding name to test, as per iconv format |
bool | $bypass | Whether or not to bypass the precompiled arrays. |
Definition at line 3514 of file HTMLPurifier.standalone.php.
References unsafeIconv().
static HTMLPurifier_Encoder::testIconvTruncateBug | ( | ) | [static] |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly.
In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
Definition at line 469 of file Encoder.php.
References ICONV_OK, ICONV_TRUNCATES, ICONV_UNUSABLE, and unsafeIconv().
Referenced by iconv(), and iconvAvailable().
static HTMLPurifier_Encoder::testIconvTruncateBug | ( | ) | [static] |
glibc iconv has a known bug where it doesn't handle the magic //IGNORE stanza correctly.
In particular, rather than ignore characters, it will return an EILSEQ after consuming some number of characters, and expect you to restart iconv as if it were an E2BIG. Old versions of PHP did not respect the errno, and returned the fragment, so as a result you would see iconv mysteriously truncating output. We can work around this by manually chopping our input into segments of about 8000 characters, as long as PHP ignores the error code. If PHP starts paying attention to the error code, iconv becomes unusable.
Definition at line 3485 of file HTMLPurifier.standalone.php.
References ICONV_OK, ICONV_TRUNCATES, ICONV_UNUSABLE, and unsafeIconv().
static HTMLPurifier_Encoder::unichr | ( | $ | code | ) | [static] |
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition at line 288 of file Encoder.php.
Referenced by HTMLPurifier_AttrDef::expandCSSEscape(), and HTMLPurifier_EntityParser::nonSpecialEntityCallback().
static HTMLPurifier_Encoder::unichr | ( | $ | code | ) | [static] |
Translates a Unicode codepoint into its corresponding UTF-8 character.
Definition at line 3304 of file HTMLPurifier.standalone.php.
static HTMLPurifier_Encoder::unsafeIconv | ( | $ | in, |
$ | out, | ||
$ | text | ||
) | [static] |
iconv wrapper which mutes errors, but doesn't work around bugs.
Definition at line 3041 of file HTMLPurifier.standalone.php.
References iconv().
static HTMLPurifier_Encoder::unsafeIconv | ( | $ | in, |
$ | out, | ||
$ | text | ||
) | [static] |
iconv wrapper which mutes errors, but doesn't work around bugs.
Definition at line 25 of file Encoder.php.
References iconv().
Referenced by convertToUTF8(), iconv(), testEncodingSupportsASCII(), and testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_OK = 0 |
No bugs detected in iconv.
Definition at line 445 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_TRUNCATES = 1 |
Iconv truncates output if converting from UTF-8 to another character set with //IGNORE, and a non-encodable character is found.
Definition at line 449 of file Encoder.php.
Referenced by testIconvTruncateBug().
const HTMLPurifier_Encoder::ICONV_UNUSABLE = 2 |
Iconv does not support //IGNORE, making it unusable for transcoding purposes.
Definition at line 453 of file Encoder.php.
Referenced by iconvAvailable(), and testIconvTruncateBug().