Welcome! » Log In » Create A New Profile

[Customization] YouTube and other videos

Posted by rbruhn 
[Customization] YouTube and other videos
March 03, 2007 05:07PM

Editorial Note: This thread details experimental, volatile, and highly dangerous techniques. Use at your own peril. -- AC

I have a community site that I'd like to use HTML Purifier on. However, being a community site, many people like to post movies. I understand from looking at the PHP code how you made the YouTube filter work, but is there a way to do something similar for other object and embed codes? Not everyone gets their content from YouTube (ie - flash mp3 players, personal movies on their own sites, etc). And yes, I know some XSS hacks can use these but I need some way to allow it.

Thanks

Edited 2 time(s). Last edit at 04/02/2007 06:38AM by Ambush Commander.

Re: YouTube and other videos
March 03, 2007 05:19PM

Yep, the system is designed to be extensible. Check the source code for ideas on how to implement something for their own pet provider.

As for arbitrary object inclusions, not gonna happen, due to the above-mentioned XSS concerns.

HTML Purifier, Standards Compliant HTML Filtering

Re: YouTube and other videos
March 03, 2007 05:34PM

Well, it's not really feasible making special cases for "pet" providers. Right now, there are about 3000 members and growing every day. Trying to placate them all would be impossible. The only thing I could think of that might work is using something similar to your filter. Basically preg_match everything in an object or embed tag, assign it to an array with a specific key, assign that key to to a span tag, and replace it on the postFilter.

The only problem is I'm horrible at regex. I tried playing with it for about 5 hours today and can't capture all the code of an object or embed tag.

Back to the drawing board.

Re: YouTube and other videos
March 03, 2007 05:36PM

If you've got snippets of what you want to capture, I can toss you a bone (i.e. try implementing one myself). And if the patterns are simple, you could make a catch all one. However, I warn you: only allow objects from domains you trust!

HTML Purifier, Standards Compliant HTML Filtering

Edited 1 time(s). Last edit at 03/03/2007 02:36PM by Ambush Commander.

Re: YouTube and other videos
March 03, 2007 06:09PM

That is part of the problem. Establishing domains that are trustworthy. So many are used by different people. For example, in one members profile they have four different URLs: [flash.picturetrail.com] [www.profiletweaks.com] [widget-67.slide.com] [widget-fa.slide.com]

All are embed tags.

That is why I was hoping to find a catch all regex for object and embed. Or two regex. One that will capture an instance of object->embed->/object and one that will capture embed.

I understand the risk I'm opening here. But I'd have a lot of pissed off members if I limit where they can reference video codes from. At least until they start getting hacked and realize they should let me do it properly!!

I'd appreciate any help you can give. All I really need is the regexs to capture the code in between. Other than that, I can figure it out on my own.

Re: YouTube and other videos
March 06, 2007 11:52PM

I'm trouble figuring exactly what the code is that is doing the embedding: these websites don't make it clear what the syntax is. The *.slide.com domains don't look very trustworthy.

HTML Purifier, Standards Compliant HTML Filtering

Re: YouTube and other videos
March 07, 2007 08:20AM

Hey there. I didn't give the full URLs above. Most of the above are in embed tags and with the other information I left out, call up certain picture slide programs.

Anyway, by rooting around on the web and piecing things together, I have managed to find a solution that works. The code below is basically it....However, I am going to add code to go through each individual tag that is found and replace certain things. For example: Object Tags 1) Remove DATA 2) param name when it equals a URL

Embed Tags 1) If they exist, make sure attributes allowScriptAccess="never" and allownetworking="internal" 2) Prevent using svg

Things of that nature. I know it is not full protection, but at least the members can still do their thing.

require_once 'HTMLPurifier/Filter.php';

class HTMLPurifier_Filter_AllowEmbed extends HTMLPurifier_Filter
{
    
    var $name = 'Embed preservation';
	var $replacements = array();
    
    function preFilter($html, $config, &$context) {
		$tags=array('/<object[^>]*>(.*)<\/object>/isU', # <object ...>...</object> areas
			  '/<embed[^>]*>(.*)<\/embed>/isU',   # <embed ...>..</embed> areas
			  '/<embed[^>]*>/isU');              # single, unclosed <embed ...> tags outsite object areas

		foreach(array_keys($tags) as $idx) { # Handle all kinds of tag areas and tags, one by one
		
			$tmptags=array(); # Storage for the found occurrences
			preg_match_all($tags[$idx],$html,$tmptags); # And here they are
			
			if ($tmptags) { # Found some?
			
				foreach(array_keys($tmptags[0]) as $secidx) { # Deal with them, one by one
				
					# We have to move them apart -- especially <object ...>...</object> areas with an internal
					# <embed ...>..</embed> area or an unclosed <embed ...> tag -- otherwise they'd be found again.
					
					$tagval=$tmptags[0][$secidx]; # This is the current occurrence to be processed later on
					$tagkey="replacetag_".$idx."_".$secidx; # Temporarily replace it by "replacetag_x_y"
					# ... where x is 0..2 (object/embed/s.embed) and y is the corresponding number.
					
					$this->replacements[$tagkey]=$tagval; # Store the occurrence beside it's unique key ...
					$html=str_replace($tagval,$tagkey,$html); # ... and actually replace the occurrence with the key
				}
			}
		
		unset($tmptags); # A bit of dirty work
		}
        return $html;
    }
    
    function postFilter($html, $config, &$context) {
		foreach($this->replacements as $tagkey => $tagval) {
			$html=str_replace($tagkey,$tagval,$html);
		}
		return $html;
    }
}

Edited 1 time(s). Last edit at 03/07/2007 05:22AM by rbruhn.

Re: YouTube and other videos
March 07, 2007 04:19PM

Ooh, that's scary. Please don't deploy that code yet.

If you insist on letting objects and embed tags through regardless of source, I would recommend piggy-backing off of HTML Purifier's HTML parsing capabilities. Check HTMLDefinition.php for ideas.

HTML Purifier, Standards Compliant HTML Filtering

Re: YouTube and other videos
March 07, 2007 11:40PM

I was wondering about this too. See I know playing files through Flash 8 would be a security risk, but what am thinking of doing is using javascript and swfobject to make more-trusted flash files from youtube, google video, etc only require flash 8 and higher because most people are using that now, but if they are unknown they should require flash 9 to play. It should be secure too if I set allowScriptAccess="never" and allownetworking="internal". And when a exploit is found for flash 9, ill instantly make them use the updated version if available. Its pretty easy to control via swfobject, but I just need to have support for allowing Embeds &amp; Objects.

I dont know if its easy to do but I was wondering if anyone had ideas for the following: different sites have different techniques of using embed codes, im wondering if there is a easy way to make all of them (where I dont know the syntax for that site) pretty much follow the syntax below after they've pasted them in: &lt;object class="userFlashVideo" attributesandvaluesaswell&gt; &lt;param attributesandvaluesaswell&gt;&lt;/param&gt; &lt;!--[if IE 6]&gt; &lt;embed attributesandvaluesaswell&gt;&lt;/embed&gt; &lt;![endif]--&gt; &lt;/object&gt;

its pretty important for that syntax because there are nasty bugs in flash with z-index, transparency &amp; overlays. Fixing one usually brings out the other bug and different bugs for different browsers. After days of testing, that syntax is the first step in repairing it. class="userFlashVideo" is there for javascript to easily find all the flash videos on the page that are from users.

Edited 1 time(s). Last edit at 03/07/2007 08:41PM by chinohillsbanditos.

Re: YouTube and other videos
March 08, 2007 12:03AM

The internal structure of objects is not too difficult, and if you use HTML Purifier's facilities your life will be made a lot easier (you have to figure out how to use them though). An object will have param and other inline content within them, that's all. This sort of simple validation can be done very easily within HTML Purifier. You will need to add the proper <param> tags for safety though.

However, I think that implementing the "Flash Satay" solution [alistapart.com], where a small "container" flash movie loads the real content might be a better idea. It would also enable you to do version checking without needing JavaScript and Swfobject. The problem then boils down to extracting the flash source from the original and then forwarding it to the satay file.

This is a complicated subject. For now, I recommend you guys stay away from flash if possible, and if that's not possible, keep a very close eye of its usage on your websites. I'll see if I can include native support for embeds in HTML Purifier, it will certainly be tough, and I won't like it very much, but people are going to do it anyway, so might as well make them do it right.

HTML Purifier, Standards Compliant HTML Filtering

lmj
Re: YouTube and other videos
March 15, 2007 09:39AM

Hello all, I just want to say that I've been playing around with HTMLPurifier for a few days now, and I'm very happy with it ;) However, I'm encountering the same video embedding issue - I need to allow for object/embed/param tags to get through the parser without getting stripped. I tried a slightly different approach from rbruhn...I extended TagTransform so that I could convert those tags to spans and make use of tag->attr to easily accept/deny attributes. Then I created a filter for each tag, but it only makes use of the postFilter function to reformat the way I want it. Of course, I know I'm not doing all that I need to if I truly want to add these tags to my implementation. The Strategy/MakeWellFormed.php file is yelling at me that there isn't enough information for the tags I'm parsing. Is this just a matter of my not fully creating the object information in HTMLDefinition? Any info is welcome :) Thanks! Lorraine

Re: YouTube and other videos
March 15, 2007 09:43AM

Hi Lorraine,

I just wanted to stop by and say I haven't been working on this problem since my last post. It got dropped temporarily while some other important issues are being handled. However, I'm still interested in finding a solution for this as well. I will be back on it as soon as I can.

Re: YouTube and other videos
March 15, 2007 02:37PM

> Of course, I know I'm not doing all that I need to if I truly want to add these tags to my implementation. The Strategy/MakeWellFormed.php file is yelling at me that there isn't enough information for the tags I'm parsing. Is this just a matter of my not fully creating the object information in HTMLDefinition?

I'll need to see some code to help you out. I will admit, the error messages are not very friendly.

HTML Purifier, Standards Compliant HTML Filtering

lmj
Re: YouTube and other videos
March 16, 2007 12:16PM

I suppose I should start over and just ask - what do I need to do to allow an element that is outside the allowed tags. I don't think a tag transform and filtering will give me the results I need. If I do the following inside HTMLDefinition: - add the element to allowed_tags - define as type = 'block' - define the allowed children with $this->info['object']->child - verify allowed attributes with $this->info['object']->attr['param'] = (custom class)

Should things go smoothly or am I missing something bigger? Thanks.

Re: YouTube and other videos
March 16, 2007 01:04PM

At times like this, I really wish the new HTMLDefinition API was live. After we finish this discussion, I'd like some feedback on what you think a better API (one that doesn't require editing core files) would be.

Add the element to allowed_tags Yes.

Define as type = 'block' No, this will be done automatically if you play things right, I'll explain this more below

Define the allowed children with $this->info['object']->child Yes, make sure you use a legit object though. For this particular example, defining the allowed children of an object tag, you'll want to use:

new HTMLPurifier_ChildDef_Optional("$e__flow | param");

Verify allowed attributes with $this->info['object']->attr['param'] = (custom class) Yes. However, you're going to have to be very careful. Out of these attributes:

  declare     (declare)      #IMPLIED  -- declare but don't instantiate flag --
  classid     %URI;          #IMPLIED  -- identifies an implementation --
  codebase    %URI;          #IMPLIED  -- base URI for classid, data, archive--
  data        %URI;          #IMPLIED  -- reference to object's data --
  type        %ContentType;  #IMPLIED  -- content type for data --
  codetype    %ContentType;  #IMPLIED  -- content type for code --
  archive     CDATA          #IMPLIED  -- space-separated list of URIs --
  standby     %Text;         #IMPLIED  -- message to show while loading --
  height      %Length;       #IMPLIED  -- override height --
  width       %Length;       #IMPLIED  -- override width --
  usemap      %URI;          #IMPLIED  -- use client-side image map --
  name        CDATA          #IMPLIED  -- submit as part of form --
  tabindex    NUMBER         #IMPLIED  -- position in tabbing order --

You will definitely need to implement type, data, width and height. Width and height have been previously done, and type can be boiled down to a HTMLPurifier_AttrDef_Enum of types you want to allow. For data, make sure you use new HTMLPurifier_AttrDef_URI(true); because HTML Purifier does some custom processing when a URI actually embeds a resource in the document.

Here's what you forgot to do:

Add object to appropriate content sets

Object needs to be part of the other people's allowed content sets, so overload $e_special_extra = 'img'; with:

$e_special_extra = 'img | object';

The final and most important part, after all that, is the AttrTransform for object tags. You must add the allowScriptAccess="never" and allownetworking="internal" attributes. This can be done with:

class HTMLPurifierX_AttrTransform_FlashObjectSecurity
{
    function transform($attr, $config, $context) {
        $attr['allowscriptaccess'] = 'never';
        $attr['allownetworking'] = 'internal';
        return $attr;
    }
}

And attach it to the post-transform as:

$this->info['object']->attr_transform_post[] = new HTMLPurifierX_AttrTransform_FlashObjectSecurity();

Finally, you will need to clean up after the PARAM tags, since not all of them will be acceptable. I, actually, would recommend just not allowing them at all, and then manually adding in <param name="movie" value="movie.swf" /> at the end. This part is the flakiest and will require further research.

HTML Purifier, Standards Compliant HTML Filtering

lmj
Re: YouTube and other videos
March 23, 2007 10:30AM

Thanks! Right after your post I was able to get things working. :) You're right - the params were more tricky, but I think I have something working reasonably well. I think that the process wasn't too bad once I understood what I needed to change, so possibly a bit more documentation is needed concerning adding your own allowed elements. But sure, if there was a quick function to call with specifics about a new element, that would be much easier.

- Lorraine

Edited 1 time(s). Last edit at 03/23/2007 07:30AM by lmj.

Author:
Your Email:

Subject:

HTML input is enabled. Make sure you escape all HTML and angled brackets with &lt; and &gt;.

Auto-paragraphing is enabled. Double newlines will be converted to paragraphs; for single newlines, use the pre tag.

Allowed tags: a, abbr, acronym, b, blockquote, caption, cite, code, dd, del, dfn, div, dl, dt, em, i, ins, kbd, li, ol, p, pre, s, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, var.

For inputting literal code such as HTML and PHP for display, use CDATA tags to auto-escape your angled brackets, and pre to preserve newlines:

<pre><![CDATA[
Place code here
]]></pre>

Power users, you can hide this notice with:

.htmlpurifier-help {display:none;}

Message: