Categories
From The Archives Web Development

Better Bandwidth Theft Protection

Bandwidth theft (sometimes referred to as hotlinking) is the bane of the internet, in some people’s opinions. Bandwidth theft is the practice of linking or embedding someone else’s content within your own page, without the owner’s permission. Posting l0lzzz ROFTLMAO images on forum message boards is probably the most common form of bandwidth theft, often costing innocent photobucket accounts. When theft also occurs with larger media such as mp3s or streaming video, it grossly increases bandwidth costs and reduces ad revenue for sites that might have advertisements on a “media player” page.

An obvious solution might be setting up a foolproof user authentication system, forcing users to log in before they can eat up your precious bandwidth. This may work well for porn sites and music stores, but it’s really not very practical – and somewhat annoying – on an average site.

HTTP_REFERER

A common method for locking down media content is the use of apache’s rewrite engine in an .htaccess file to check the HTTP_REFERER, eg:

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} .*jpg$|.*gif$|.*png$ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !yoursite\.com [NC]
RewriteCond %{HTTP_REFERER} !friendlysite\.com [NC]
RewriteCond %{HTTP_REFERER} !google\. [NC]
RewriteCond %{HTTP_REFERER} !search\?q=cache [NC]

If fact, this is the only method i have been able to find online. And this does work very well in a lot of situations. A List Apart even has an elegant solutions using php, redirecting the user to a nicely formatted page when a regular link is clicked. See: Smarter Image Hotlinking Prevention(the above code snippet was stolen from said article)

Streaming Media

A major problem arises when you need to protect your streaming media or any files which are not normally viewed by a web browser. When a media player like Real, Winamp or Windows Media Player request a file it does not send the HTTP_REFERER header to your webserver. [Although I suspect they might, I have not bothered to research whether embedded players send this header, as it is trivial to copy the src from an <embed> tag then open it in a media player and share it with your friends on a message board.]

What we need is some form of anonymous authorization, to prove that a request is coming from our site. Authorization normally requires at least two pieces of information, a unique username and it’s corresponding password. But didn’t I say that logging in was annoying? Yes, yes I did. Read on.

Instead of a username we can use the one reliably unique parameter of any internet connection: the ip address. And we will use our favorite web programming language to verify that a given ip address has visited your website. We could set up a fancy IP logging database – but this could potentially get out of hand very quickly. Why make your webserver do more than it has to.

A better solution

The solution I came up with tonight is an encrypted unique session id. This sort of thing is probably overly obvious to anyone who’s into cryptography- but i’m not, so i have no idea if this whole thing sounds really basic.

I’m getting a little ahead of myself.
The first thing we need to do is have php handle all media requests, something as simple as mediafiles.php?file=filename.ext. This way we obscure the location for the actual files, forcing users to link our php. You could even use an id for a database record containing the fileinfo or the file itself, obviously. Just make sure to validate you file so as you don’t reveal your complete server directory structure allowing your site to be hacked on national tv (yes, i’m looking your way junos.ca). Now that we have php handling our file requests when can do some validation.

Next, we “encrypt” the IP address and add it to the request uri. This way we can use the IP address as both username and password. A couple of good encryption methods are availible in php, namely md5() and crypt(). Observe the following code:

media link:

<a href="mediafiles.php?file=filename.ext&uid;=<?php echo md5($_SERVER['REMOTE_ADDR']); ?>">hot video</a>

mediafiles.php code snippet:

<?php
if(!isset($_GET['file']) || !isset($_GET['uid']){
exit;
}

if($_GET['uid'] == md5($_SERVER['REMOTE_ADDR'])){
include('/path/to/'.$_GET['file'];
}
?>

When some luser posts your link to their favorite forum it will look something like: http://www.yoursite.com/mediafiles.php?file=filename&uid=1f3870be274f6c49b3e31a0c6728957f. And when some other forum user clicks the link the md5 hash of their ip obviously will not match the original user’s.

There we have it. We’ve validated a user without any actual user validation.

In my opinion, simply using md5() is not quite good enough. It wouldn’t be too hard for an observant user to expose our little slight of hand. If they recognize the md5 hash in a link, they may create an md5() has of their own ip address and discover that they suddenly have access to our locked media. As added security, I suggest either creating your own pseudo-encryption method, or completely altering the ip address before md5()ing it. You might want to use the ip address to generate another number or only using certain parts of the address.

We can even cause the uid to expire by adding a date() value to our encryption function.

An example:

function encrypt_ip($ip){
   $quads = explode('.', $ip);
   $seed = $quads[0]+$quads[1]+$quads[2]+$quads[3] + date('z');
   return md5($seed);
}

To spell it out for you; if my ip address is 127.0.0.1, then today’s $seed value equals 341. This uid will expire tomorrow. Summation probably is not the best way to randomize the ip address, as it actually makes the ip less unique. As you can easily see a number of other ip addresses sum up to the same value. Someone who actually knows mathematics can tell you the odds. Note: whatever you do to generate $seed must not include any random numbers, or any values (besides the date) that will change – we need encrypt_ip() to consistently generate the same values for a give ip address.

I’ve spent far too much time writing this.

My next post will contain some suggestions for making this process a little more transparent.