Archive for the ‘Programming’ category

A Working Exit Popup

April 3rd, 2010

In this tutorial I’m going to show you how to create an exit popup that has been tested in Internet Explorer 8, FireFox 3, Chrome and Safari 4 (Opera currently does not support the onbeforeunload event).

Traditionally to cause an exit popup you would simply reference a Javascript function in the body tag of your page such as this:

<body onunload="SomeFunction()">

Unfortunately this no longer works as most modern browsers will simply continue to the new destination or close, even if the function has executed. To work with modern browsers you need to assign a function to the OnBeforeUnload prompting the user to stay on the page.

Basic Confirmation Dialog
If you wish to simply ask the user if they wish to leave you can easily assign a function to the OnBeforeUnload directive. This method will prompt the user every time they attempt to navigate away from the page or close the window.

<script type="text/javascript">
     window.onbeforeunload = function() { return "Are you sure you want to leave?"; }
</script>

You may want to ensure that the popup does not occur again such as when they decided to stay on the page. You can do this by making a small modification to your code to alter to the behavior after the dialog is prompted.

<script type="text/javascript">
     var popit = true;
     window.onbeforeunload = function() { 
          if(popit == true) {
               popit = false;
               return "Are you sure you want to leave?"; 
          }
     }
</script>

The above will turn off the popup trigger after the first time the beforeunload event occurs. If the function does not return anything the browser will not prompt the user.

Using JQuery

Sometimes you may want to take some more control over your popup behavior. For example with the above methods, even clicking an internal link on your page will cause the popup to activate. This can be a major turn off to a potential buyer who has opted to continue to the sales page only to be bugged by the dialog. To fix this we can use JQuery.

<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript">
     function PopIt() { return "Are you sure you want to leave?"; }
     function UnPopIt()  { /* nothing to return */ } 
 
     $(document).ready(function() {
     	window.onbeforeunload = PopIt;
		$("a").click(function(){ window.onbeforeunload = UnPopIt; });
     });
</script>

What I did above was create two new functions. One of them executes the popup action and the other does… nothing. We assign the PopIt() function to the onbeforeunload event. However when any anchors, identified by the $(“a”) selector is triggered by a click event, the event is instead assigned to UnPopIt() thus allowing the user to click to their destination without interruptions.

Getting Fancy

So now that we know how to prompt the user to stay on the page, we need to actually do something to capture their participation. Perhaps you would like to offer them the option of signing up for a mailing list for other offers, or maybe entice them with a special deal since the original offer probably did not appeal to them.

We could trigger a popup window, but most browsers have popup blockers built in. We could however utilize some more JQuery goodness and use some Ajax to pop up an internal javascript dialog which is not normally blocked by popup blockers. For this we will use FancyBox a popular JQuery based lightbox script which can be downloaded here.

Fancybox can popup a number of different contents such as images, html code, flash videos as well as iframes. For this example we’ll popup an aweber form to collect the user’s information for a mailing list.

First we’ll want to download Fancybox and add it to our existing code:

<script type="text/javascript" src="jquery.js"></script>
<script src="/fancybox/fancybox.js" type="text/javascript"></script>
<link rel="stylesheet" href="/fancybox/fancybox.css" type="text/css" media="screen"/>
...

Next we will want to create the html dialog for our popup window, you can simply use aweber’s javascript loader.

<div style="display: none;">
	<a id="trigger" href="#aweber">&nbsp;</a>
	<div id="aweber" style="width: 250px; height: 400px;">
		<script type="text/javascript" src="http://forms.aweber.com/form/90/1930283.js"></script>
	</div>
</div>

You’ll notice we also added an anchor with a target the same name as the div’s ID, and that the div we created. Also the div we’ve created is also set to be invisible to the user. The anchor that has been created will act as the trigger for Fancybox. The above code can be placed just before the </body> tag of your site.

Next we need to update our javascript code to trigger the fancybox when the exitpop is initialized.

<script type="text/javascript" src="jquery.js"></script>
<script src="/fancybox/fancybox.js" type="text/javascript"></script>
<link rel="stylesheet" href="/fancybox/fancybox.css" type="text/css" media="screen"/>
<script type="text/javascript">
	function PopIt() { 
		$("a.trigger").trigger('click');
		window.onbeforeunload = UnPopIt;
		return "Would you like to join our mailing list for other offers?"; 
	}
	function UnPopIt()  { /* nothing to return */ } 
 
	window.onbeforeunload = PopIt;
 
	$(document).ready(function() {
	    $("a[id!=trigger]").click(function(){ window.onbeforeunload = UnPopIt; });
 
	    $("a#trigger").fancybox({
			'hideOnContentClick': false,
			'showCloseButton': false
		});
	});
</script>

In the fancybox initialization noted by $(“a#trigger”).fancybox() you can define various attributes of the popup, such as the dimensions (you can match this to your aweber form, with some margin). I’ve also decided to make sure that the popup does not disappear on click and that the close button was not visible. Also we modified the click trigger for anchors in general not to include the trigger so that our fancybox trigger can execute normally.

So there you have it, with the above, when a user attempts to navigate away from your page, the popup dialog will prompt the user on whether or not they’d like to stay (the nice part is most people tend to click ‘ok’ or ‘continue’ when it’s the ‘cancel’ button that allows them to leave). Upon clicking ok, the fancybox popup will trigger showing the aweber form that they can fill out.

On the next page you can see some full-source examples as well as a demonstration and download link.

Path_Info & PHP_SELF woes [NginX]

December 12th, 2009

Over the last couple of years I’ve been constantly researching for a way to get the PHP environment variables to show up correctly. My latest pains were with PATH_INFO and PHP_SELF, which are now finally solved.

My current configuration are PHP-FPM (5.2.10) and NginX (0.8.29) on a CentOS 5.4 x64 VPS.

Traditionally you would use a PHP configuration such as this:

	server {
		server_name  your-domain.com www.your-domain.com;
 
		location / {
			root html/default;
		}
 
		location ~ \.php$ {
			include fastcgi_params;
			fastcgi_param  SCRIPT_FILENAME  /usr/local/nginx/html/default$fastcgi_script_name;
			fastcgi_pass  127.0.0.1:9000;
		}
	}

Within fastcgi_params would be something like this:

fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;
 
fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;
 
fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;
 
fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;
 
# PHP only, required if PHP was built with --enable-force-cgi-redirect
fastcgi_param  REDIRECT_STATUS    200;

While the above method may seem to work at first, you’ll quickly notice problems when it comes to using $_SERVER['PATH_INFO'] and $_PATH['PATH_TRANSLATED'], and often enough $_SERVER['PHP_SELF'] ends up being set incorrectly when you try to adjust for the two environment variables.

Here is a setup I’ve come to prefer, especially when it comes to having multiple virtual hosts that use PHP. Notable tips are commented below the line.

Simple Nginx configuration file with a single virtual host

worker_processes  1;
pid        logs/nginx.pid;
events { worker_connections  1024; }
http {
	include       mime.types;
	default_type  application/octet-stream;
	sendfile        on;
	keepalive_timeout  65;
 
	index index.html index.htm index.php;
	# Identical to Apache's DirectoryIndex, setting it in
	# the http block set it as a default for all server blocks within
 
	server {
		listen	80; 
		# since port 80 is set by default, you do not actually need
		# to set this, unless of course you are binding to a specific
		# address such as listen server-ip-address:80 or alternate port; 
 
		server_name  gyn0.com www.gyn0.com;
 
		root   html/default;
		# You will want to set your root here, since otherwise
		# $document_root within the php block will not work
		# if you set it in the location block you would also have 
		# to set the php block within that location as well
 
		location / {
			# Below are some wordpress-friendly rewrite rules
			if (-f $request_filename) {
				expires 30d;
				break;
			}
 
			if (!-e $request_filename) {   			
				rewrite ^/.+/?(/wp-.*) $1 last;
				rewrite ^/.+/?(/.*\.php)$ $1 last;
				rewrite ^/(.+)$ /index.php?q=$1 last;
			}
		}
 
		location = /favicon.ico { access_log off; log_not_found off; }
		# If you haven't created a favicon for your site, you can keep
		# your access and error logs clean by turning off the logs
		# when a browser requests the fav icon (its also a good way
		# to keep your logs from filling with useless information)
 
		include php;
		# I prefer to keep my php settings in one file, so I can simply
		# paste this single line for each of my virtual hosts
	}
}

Now the php file (which I’ve created in the /conf folder with nginx.conf)

fastcgi_intercept_errors on;
# this will allow Nginx to intercept 4xx/5xx error codes
# Nginx will only intercept if there are error page rules defined
# -- This is better placed in the http {} block as a default
# -- so that in the case of wordpress, you can turn it off specifically
# -- in that virtual host's server block
 
location ~ \.php {
	fastcgi_split_path_info ^(.+\.php)(/.+)$;
	# A handy function that became available in 0.7.31 that breaks down 
	# The path information based on the provided regex expression
	# This is handy for requests such as file.php/some/paths/here/ 
 
	include fastcgi_params;
	# Includes the modified fastcgi parameter file (see below)
 
	fastcgi_pass   127.0.0.1:9000;
	fastcgi_index  index.php;
}

The new modified fastcgi_params:

fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;
 
fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
#SCRIPT_NAME needs to remain without the document_root on front
#Otherwise PHP_SELF breaks, the interpreter will still find the file
#because of the added SCRIPT_FILENAME parameter below
 
fastcgi_param  SCRIPT_FILENAME    $document_root$fastcgi_script_name;
#This one was added right into the file, since with root moved, and path_info split
#It should no longer need to be set manually for each virtual host
 
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  SERVER_PROTOCOL    $server_protocol;
 
#the two lines below are possible now because of the fastcgi_split_path_info function
fastcgi_param  PATH_INFO          $fastcgi_path_info;
fastcgi_param  PATH_TRANSLATED    $document_root$fastcgi_path_info;
 
fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
fastcgi_param  SERVER_SOFTWARE    nginx;
# I removed the version, as not to give away too much information
 
fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
fastcgi_param  SERVER_ADDR        $server_addr;
fastcgi_param  SERVER_PORT        $server_port;
fastcgi_param  SERVER_NAME        $server_name;
 
#PHP only, required if PHP was built with --enable-force-cgi-redirect
fastcgi_param  REDIRECT_STATUS    200;

There you have it, PHP should now have correct environment variables. For example http://www.kbeezie.com/php.php/a/path/string/?var=foo would render the following results (the url above doesn’t actually work, sorry :P ):

$_SERVER["QUERY_STRING"] -> var=foo
$_SERVER["SCRIPT_NAME"] -> /php.php
$_SERVER["SCRIPT_FILENAME"] -> /usr/local/nginx/html/default/php.php
$_SERVER["REQUEST_URI"] -> /php.php/a/path/string/?var=foo
$_SERVER["DOCUMENT_URI"] -> /php.php/a/path/string/
$_SERVER["DOCUMENT_ROOT"] -> /usr/local/nginx/html/default
$_SERVER["PATH_INFO"] -> /a/path/string/
$_SERVER["PATH_TRANSLATED"] -> /usr/local/nginx/html/default/a/path/string
$_SERVER["PHP_SELF"] -> /php.php/a/path/string/

So there you have it. A simple php block that will correctly assign the path environment variables, without having to use multiple blocks and patterns. And quite easy to simply assign to a new virtual host by simply pasting the include php; line.

Cloaking and Faking the Referrer

November 23rd, 2009

Cloaking the Referrer

This portion is the more useful portion out of the two if you are trying to drive traffic from a site that is not exactly ‘kosher’ (ie: blackhat). You’ll need a whitehat domain and landing page to make this method effective.

For this article I wrote a simple php script, it may take a second to wrap your head around. I’m not using short tags here because as of PHP 5.3 they are no longer supported by default.

At the very least there is three components to Cloaking the referrer:

  • The Blackhat Domain
  • The Whitehat Domain
  • The ‘Fake’ Landing Page

The first part is simply direct linking from the blackhat domain to the whitehat one.

The second part is the PHP code below, for my example I’m treating it as the base file on the domain (index.php) which will start the redirect process when the blackhat domain is detected as the referrer.

<?php
/* Setup your sites here, blackhat, whitehat and advertiser */
$bh = "http://blackhat.kbeezie.com/";
$wh = "http://whitehat.kbeezie.com/";
$ad = "http://advertiser.kbeezie.com/";
 
/* Grabs the querystring, referer, and initialized some variables */
$pg = (isset($_SERVER['QUERY_STRING']))?$_SERVER['QUERY_STRING']:'';
$rf = (isset($_SERVER['HTTP_REFERER']))?$_SERVER['HTTP_REFERER']:'';
$meta=$js=$ie=$lp="";
 
/* if the referrer is the blackhat domain start a meta redirect
and assign a temporary cookie for 5 seconds */
if(substr($rf, 0, strlen($bh)) == $bh) { 
	$meta = $wh.'?r'; 
	setcookie("r", "1", time()+5); 
}
/* The first meta refresh has completed, and 
we're checking for the 'r' cookie, if set
set a meta refresh back to the domain and set
a new 'rr' cookie while killing the old 'r' one. */
elseif(isset($_COOKIE['r'])) {
	$meta = $wh; 
	setcookie("r", "", time()-5); 
	setcookie("rr", "1", time()+5);
}
/* If we're back to the base of the domain, and the 'rr' cookie is set
then we need to prepare for the final redirect to the advertiser
using javascript */
elseif(($_SERVER['REQUEST_URI'] == '/') && (isset($_COOKIE['rr'])))
{
	if (isset($_SERVER['HTTP_USER_AGENT']) && (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== false))
	{
		$js = "document.forms[0].submit();";
		$ie = true; 
	} else {
		$js = 'window.location = "'.$ad.'";';
	}
	setcookie("rr", "", time()-5);
}
 
/* It is much faster to send meta data via the HTTP headers
its also a good way to hide the meta data from the HTML 
source itself */
header("Content-Type: text/html; charset=UTF-8");
if($meta != "") header("refresh: 0;url=".$meta);
if(($meta == "") && ($js == "")) $lp = true;
echo '<?xml version="1.0" encoding="UTF-8"?>'; 
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
	<head profile="http://gmpg.org/xfn/11">
		<title>Whitehat Domain</title>
		<script type="text/javascript" defer="defer">
		<!--
			<?php echo $js; ?>
		//-->
		</script>
	</head>
	<body>
		<?php if($lp === true)
			//Redirect to your lander, or load it here.
			echo "<b>There would be a landing page here if the destination owner attempted to visit the referral url.</b>";
		?>
		<?php if($ie === true) { ?>
			<form action="<?=$ad?>" method="get"></form>
		<? } ?>
		&nbsp;
	</body>
</html>

Some Explanations

You’ll notice the first two redirects are to blank out the referrer using a Double Meta Refresh (DMR). Course with Firefox and Internet Explorer using a quick meta refresh already blanks out the referrer by the time the second page loads, as such why cookies are being used. Safari appears to keep the previous page as the referrer during each zero second refresh, but that configuration wasn’t compatible with all three browsers.

After the DMR has completed, we want to refresh via Javascript. Safari and Firefox will carry the referrer over from the whitehat domain when the window.location method is used, but Internet Explorer will instead blank the referrer. Because of this, we have to resort to using a form submission to force the referrer to be sent.

The result of the above script will cause the advertiser to see http://whitehat.kbeezie.com as the referrer instead of http://blackhat.kbeezie.com. Also any attempt of the advertiser to visit the referring URL will result in the landing page loading, instead of a redirection.

On the next page I’ll show you how to fake a referrer, even if you don’t own that domain.

WordPress Automatic Update with SSH

November 22nd, 2009

If you’re like me, you don’t even want the insecure FTP protocol running on your server, but by default wordpress doesn’t even give you the option of using SSH to automatically upgrade your plugins, or wordpress itself.

I’m using a barebone CentOS server for this site, running on the Nginx webserver. I do not have a FTP server installed, and would very much prefer not to have one. Right now the only way to get into the server is via SSH. Below is a working configuration added to the wp-config.php file.

define('FS_METHOD', 'direct'); // 'ssh' is also an option, but did not work for my setup
define('FTP_BASE', '/opt/local/nginx/html/domain.com/');
define('FTP_CONTENT_DIR', '/opt/local/nginx/html/domain.com/wp-content/');
define('FTP_PLUGIN_DIR ', '/opt/local/nginx/html/domain.com/wp-content/plugins/');
define('FTP_PUBKEY', '/home/username/.ssh/id_rsa.pub');
define('FTP_PRIKEY', '/home/username/.ssh/id_rsa');
define('FTP_USER', 'username');
define('FTP_HOST', 'your-domain.com:22');

You can generate a public/private key by executing the following:

$ ssh-keygen

It will ask you where you wish to save the key (the default path usually /home/username/.ssh/id_rsa should be fine), followed by a passphrase which you can just leave blank for this purpose, then the location of the public key which is fine at its default location (usually /home/username/.ssh/id_rsa.pub)

You’ll also want to create an authorized key file by copying the public key into authorized_keys. We also need to make sure to set the proper permissions.

$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ cd ~/
$ chmod 755 .ssh
$ chmod 644 .ssh/*

By using the key pair shown in the configuration above you only have to supply the SSH username, otherwise if you don’t want to use key pairs, you can instead provide your SSH password with the following line:

define('FTP_PASS', 'password');

Make sure that the folders where updates (such as plugins) will need to be performed are writable by wordpress. From there when you click “upgrade automatically” it should just simply happen.

Handling wildcard subdomains with PHP

November 5th, 2009

Subdomains can be a very handy way to make your urls more friendly looking. They can also be incredibly useful for membership driven websites to allow members to have their own custom subdomain. But how do make manage dynamic subdomains with PHP?

There is two parts to making dynamic subdomains work in this case. The first is updating your DNS and webserver to expect wildcard requests. On the DNS side its a simple matter of adding * as an ‘A’ record, or cname pointing to the primary domain or IP. In most cases if you are using shared hosting, you’ll need to contact your hosting provider to enable wildcard DNS on your account. Otherwise if you are on a dedicated server or virtual private server, this can be most likely handled by yourself, such as editing the DNS zone in WHM.

Once you have enabled wildcard DNS, you’ll need to configure your webserver to expect a *.domain.com request. Otherwise it will ignore all requests other than the base domain and pre-defined subdomains.

In Nginx simply modify the server_name directive:

server_name yourdomain.com *.yourdomain.com;

In Apache you would add a ServerAlias directive to the appropriate virtual host entry.

ServerAlias *.yourdomain.com

On most Cpanel driven servers the http.conf file for Apache can be found in /usr/local/apache/conf.

Once you have configured both the name server and web server for wildcard requests you can now use php to not only detect the provide subdomain, but also generate them. The code below will obtain the subdomain being visited.

$sub = str_replace(".yourdomain.com", "", $_SERVER["HTTP_HOST"]);

The above will simply take the hostname stored in the HTTP_HOST enviroment variable, and attempt to replace the base domain with nothing (if no subdomain was provided all that remains is the base domain).

Once the subdomain is determined, you can filter out requests either with an if/then block or in a switch/case statement. In the event of using memeber names as subdomains you can use a switch statement to make sure that the subdomain doesn’t match the homepage (www / yourdomain.com) or a reserved page name such as register.yourdomain.com and assume anything else must be a member name. With that information you can pull the members information from a file or database.

Using PHP to handle the subdomain is certainly easier for most novice as opposed to modifying an htaccess to do rewrite rules passing the subdomain off as a query (which takes more resources than str_replace) or having to modify the webservers configuration file, especially if you do not have direct access to the webserver’s configuration file or do not have override permission.

If your current hosting provider does not support wildcard subdomain, and you are running a memeber based site, consider upgrading to a Virtual Private Server or use a shared hosting provider such as HostGator that allows you to turn on such capability via a support ticket.

Nginx and Django

September 14th, 2009

In a previous guide I showed how to use Passenger (aka mod_rails) to work with Python (WSGI) scripts. While this proved effective for simple wsgi applications, a framework such as Django required a bit more love. This guide assume you already have everything installed from the previous guide, and that Python has already been configured with Django and other modules.

Aside from having the usual setup, you will need the original Django source files. One way of going about this is downloading the latest source from the Django Project. The next couple of steps assume you have nginx installed at the /usr/local/nginx/ location, and already have a src folder under your home folder.

# cd ~/src
# wget http://www.djangoproject.com/download/
# tar zxvf Django-1.1.tar.gz
# cd Django-1.1
# mv ./django /usr/local/nginx/html

I should note, my /html/ folder isn’t publicly accessible because all my sites are subfolders under that location. I use it primarily to store resources for the folders beneath it. You can change where you save the source, just remember where you put it and that the webserver has access to it.

Open up your nginx configuration where you have already setup a passenger app from the previous guide. Should look something like this.

server {
	server_name project.domain.com;
	root /usr/local/nginx/html/project.domain.com/public/;
	passenger_enabled on;
}

The first thing you have to do to set this up for Django (at the very least), is to modify the server block above to something more like this:

server {
	server_name project.domain.com;
 
	location / {
		root /usr/local/nginx/html/project.domain.com/public/;
		passenger_enabled on;
	}
 
	location /media {
		alias /usr/local/nginx/html/django/contrib/admin/media;
	}
}

Now you have either create or move your django application into the site folder. For example, if your site is defined as /usr/local/nginx/html/project.domain.com like above, and your django application is called mytest , you will end up having it located in /usr/local/nginx/html/project.domain.com/mytest. In the end you should have a folder structure similar to this (based off the Django tutorials).

Directory Structure

Now open up the passenger_wsgi.py from the previous guide which may look something like this:

import sys
import os
 
def application(environ, start_response):
	start_response("200 OK", [])
	ret = ["%s: %s\n" % (key, value)
		for key, value in environ.iteritems()]
	return ret

We’re going to turn this into a WSGI initializer for the Django application like so:

import os, sys
 
#automatically finds application's current path
nginx_configuration= os.path.dirname(__file__)
project = os.path.dirname(nginx_configuration)
workspace = os.path.dirname(project)
sys.path.append(workspace) 
 
os.environ['DJANGO_SETTINGS_MODULE'] = 'testapp.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

After saving, should restart the nginx server, then load up the site in a browser. If this is a new Django application you’ll be greeted with a “Welcome to Django” page, otherwise the application you had already created.

Static content can be access by creating additional Aliases in Nginx, or by using django.views.static.serve method in your applications url settings.

Using Python with Nginx via Passenger

September 14th, 2009

Since Nginx 0.6.* I been looking for an effective way to run Python (WSGI) applications thru the Nginx Webserver. A couple of options were available at the time:

Apache + mod_wsgi
One possible solution was to run Apache as a backend server listening on 127.0.0.1:8080 (or a port of your choice), which could also be easily used to serve up other dynamic content such as php. Many have used this setup for a great deal of internal load balancing, letting Nginx serve strictly static content, and Apache to handle the dynamic content.

In the sense of a WSGI Python application, you would have a server block like this in Nginx:

server {
	#server_name is what nginx responds to
	server_name  mydomain.com www.mydomain.com;
 
	location / {
		#setting a Host header is handy if you have apache
		#setup with name-based virtual hosts
		proxy_set_header Host mydomain.com;
 
		#requires nginx to be compiled with --with-http_realip_module
		proxy_set_header X-Real-IP $remote_addr;
 
		#This can also be the IP address of another server
		proxy_pass http://127.0.0.1:8080;
	}
}

With the above method you could go with the easy route of having mod_wsgi (as of this writing, mod_wsgi is available at 2.5 supporting Python 2.6) installed via Apache. However doubling up the number of webservers run, might defeat the purpose of running Nginx for some of us, especially for those less familiar with Apache configuration.

Nginx WSGI Module

There was a port of Apache’s mod_wsgi over to Nginx by Manlio Perillo. However this module had only been tested on Nginx 0.5.34, and patched for 0.6.*.

However even when patched, the module remained quite buggy. Implementing this module went something like this:

In the http { } block

    wsgi_python_optimize 0;
    wsgi_enable_subinterpreters on;

In your server { } block

server {
	listen 80;
	server_name yourdomain.com;
 
        location / {
            wsgi_pass  /var/www/username/domain/application.wsgi;
 
            include wsgi_vars;
 
            wsgi_var  SCRIPT_NAME         /var/www/username/domain/application.wsgi;
            wsgi_var  DOCUMENT_ROOT       /var/www/username/domain;
 
            wsgi_pass_authorization off;
            wsgi_script_reloading on;
            wsgi_use_main_interpreter on;
        }
}

This module is no longer maintained and is does not to build on modern versions of Nginx (0.7/0.8), and was barely compatible with 0.6 by using a patch. The creator of the Apache mod_wsgi, Graham Dumpleton commented on his blog regarding the Nginx implementation:


The nginx version of mod_wsgi borrows some code from my original Apache version, but obviously since the internals of Apache and nginx are very different, the main parts of the code which interface with the web server are unique. Although I condoned use of the source code, I do wish I had insisted from the outset that it not be called mod_wsgi due to the confusion that has at times arisen.

Although development on the nginx version of mod_wsgi appears to no longer be happening, this isn’t stopping people from using it and many are quite happy with it. The question is whether they really understand anything about how nginx works and the shortcomings in how nginx and mod_wsgi work together.

Admittedly the author of the mod_wsgi module for nginx has been up front in pointing out that because nginx is asynchronous, and with WSGI not designed for such a system, that once the WSGI application is entered all other activity by the web server is blocked. The recommendation resulting from this is that static files should not be served from the same web server. Use of multiple nginx worker processes is also suggested as a way of mitigating the problem.

Source: Graham Dumpleton: Blocking requests and nginx version of mod_wsgi

There is however a far more practical solution fit for production use, especially for those using modern versions of Nginx.

Scraping Google Front Page Results

August 25th, 2009

In this article I’ll show you how you can use cURL and simple_html_dom functionality to scrap the basic content from the front page results of google provided with a search query.

What you will need:

The Code

<?
include('simple_html_dom.php');
 
function strip_tags_content($text, $tags = '', $invert = FALSE) {
	/*
	This function removes all html tags and the contents within them
	unlike strip_tags which only removes the tags themselves.
	*/
	//removes <br> often found in google result text, which is not handled below
	$text = str_ireplace('<br>', '', $text);
 
	preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
	$tags = array_unique($tags[1]);
 
	if(is_array($tags) AND count($tags) > 0) {
		//if invert is false, it will remove all tags except those passed a
		if($invert == FALSE) {
			return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
		//if invert is true, it will remove only the tags passed to this function
		} else {
			return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
		}
	//if no tags were passed to this function, simply remove all the tags
	} elseif($invert == FALSE) {
		return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
	}
 
	return $text;
}
 
function file_get_contents_curl($url) {
	/*
	This is a file_get_contents replacement function using cURL
	One slight difference is that it uses your browser's idenity
	as it's own when contacting google. 
	*/
	$ch = curl_init();
 
	curl_setopt($ch, CURLOPT_USERAGENT,	$_SERVER['HTTP_USER_AGENT']);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_URL, $url);
 
	$data = curl_exec($ch);
	curl_close($ch);
 
	return $data;
}
 
//Set query if any passed
$q = isset($_GET['q'])?urlencode(str_replace(' ', '+', $_GET['q'])):'none';
 
//Obtain the first page html with the formated url
$data = file_get_contents_curl('http://www.google.com/search?hl=en&q='.$q);
 
/*
create a simple_html_dom object from the retreived string
you could also perform file_get_html("http://...") instead of
file_get_contents_curl above, but it wouldn't change the default
User-Agent
*/
 
$html = str_get_html($data);
 
 
$result = array();
 
foreach($html->find('li.g') as $g)
{
	/*
	each search results are in a list item with a class name 'g'
	we are seperating each of the elements within, into an array
 
	Titles are stored within <h3><a...>{title}</a></h3>
	Links are in the href of the anchor contained in the <h3>...</h3>
	Summaries are stored in a div with a classname of 's'
	*/
 
	$h3 = $g->find('h3.r', 0);
	$s = $g->find('div.s', 0);
	$a = $h3->find('a', 0);
	$result[] = array('title' => strip_tags($a->innertext), 
		'link' => $a->href, 
		'description' => strip_tags_content($s->innertext));
}
 
if($_GET['serialize'] == '1')
{
	/* 
	if you pass serialize=1 to the script
	it will echo out a serialized string
	which can be unserialized back to an 
	array on a receiving script
	*/
	echo serialize($result);
}
else
{
	/* 
	Otherwise it prints out the array structure so that it
	is more human readible. You could instead perform a 
	foreach loop on the variable $result so that you can 
	organize the html output, or insert the data into a database
	*/
	echo "<textarea style='width: 1024px; height: 600px;'>";
	print_r($result);
	echo "</textarea>";
}
//Cleans up the memory 
$html->clear(); exit();
?>

The variable ‘q’ passed to the file will provide the query string. If you pass serialize=1 to the script, the output will be a serialized string which can be converted back to an array with the PHP function unserialize.

Keep in mind that hitting google too often for search results from the same server might eventually get your server blocked from accessing, especially if you all multiple queries in a short period of time.

An example of the output when the word ‘funny’ is searched.

Array
(
    [0] => Array
        (
            [title] => Funny Videos, Funny Pictures, Funny Jokes, Funny News - Funny.com
            [link] => http://www.funny.com/
            [description] =>  videos,  pictures,  jokes,  news.
        )
 
    [1] => Array
        (
            [title] => Lolcats 'n' Funny Pictures of Cats – I Can Has Cheezburger?
            [link] => http://icanhascheezburger.com/
            [description] => Aug 25, 2009  Humorous captioned pictures of felines and other animals. 
            Visitors can submit their own material or add captions to a large archive of 
        )
 
    [2] => Array
        (
            [title] => Funny Videos, Funny Pictures, Flash Games, Jokes
            [link] => http://www.ebaumsworld.com/
            [description] =>  videos, flash games, clean jokes, clean humor, hilarious flash,  pics , 
            office humor, prank phone calls, flash cartoons,  animation, 
        )
 
    [3] => Array
        (
            [title] => Video results for funny
            [link] => http://video.google.com/videosearch?hl=en&q=funny&um=1&ie=UTF-8&ei=TW2USrSHIIK0NriQwPoH&sa=X
            &oi=video_result_group&ct=title&resnum=4
            [description] => CollegeHumor's recent videos section has all the best  videos on the Internet. 
            There are  video clips, hilarious viral videos,  college 
        )
 
    [4] => Array
        (
            [title] => Funny Videos on CollegeHumor. Watch funny videos and comedy movie ...
            [link] => http://www.collegehumor.com/videos
            [description] => CollegeHumor's recent videos section has all the best  videos on the Internet. 
            There are  video clips, hilarious viral videos,  college 
        )
 
    [5] => Array
        (
            [title] => Funnyjunk - Funny Pictures and Funny Videos
            [link] => http://www.funnyjunk.com/
            [description] =>  pictures,  videos, flash games and  movies.
        )
 
    [6] => Array
        (
            [title] => Photobucket | funny Pictures, funny Images, funny Photos
            [link] => http://photobucket.com/images/funny/
            [description] => View 462373  Pictures,  Images,  Photos on Photobucket. Share them with 
            your friends on MySpace or upload your own!
        )
 
    [7] => Array
        (
            [title] => FAIL Blog: Pictures and Videos of Owned, Pwnd and Fail Moments
            [link] => http://failblog.org/
            [description] =>  FAIL Pictures and Videos. Home Send In The Fail Boat Vote [Random]. You must 
            be logged in to add favorites | Register for a new account 
        )
 
    [8] => Array
        (
            [title] => funny
            [link] => http://www.reddit.com/r/funny/
            [description] => . (147376 subscribers). a community for 1 year. moderators · Submit a link. to 
            anything interesting: news article, blog entry, video, picture. 
        )
 
    [9] => Array
        (
            [title] => News results for funny
            [link] => http://news.google.com/news?hl=en&q=funny&um=1&ie=UTF-8&ei=TW2USrSHIIK0NriQwPoH&sa=X
            &oi=news_group&ct=title&resnum=11
            [description] => "A more mature but still  Judd Apatow comedy whose move into serious 
            human relation issues nearly scuttles the third act. 
        )
 
)

Four free Geolocation Methods

August 22nd, 2009

Geolocation or Geo-Targeting is a method of identifying a visitor’s location in the world. You can use this information for anything as simple as greeting a visitor in their native language to automatically redirecting visitors to valid affiliate offers for the visiting demographic. This article takes a look at four different services that offer geolocation for free. This article will focus primarily on retrieving the visitor’s country code.

Maxmind GeoLite Country

Touting a 99.5% data accuracy is the GeoLite Country database allowing you to lookup a visitor’s country locally on your server. The database file is updated at the first of each month. The benefit with a local binary database is that lookups are performed right at the server and as a result may be faster and easier to cache than hitting a remote server you have no control over. However performing a large number of queries maybe cause some strain on your server if you have a cheaper machine or use shared hosting.

The easiest way to use the GeoLite country database is to download the binary format along with the geoip.inc file from the Maxmind PHP API. Once the GeoIP.dat and geoip.inc files are uploaded onto the server, the following PHP code will retrieve the visitor’s country code.

include("geoip.inc");
$gi = geoip_open("GeoIP.dat",GEOIP_STANDARD);
$addr = $_SERVER["REMOTE_ADDR"];
$country_code = geoip_country_code_by_addr($gi, $addr);	
geoip_close($gi);

Other APIs include C, Perl, Javascript, Python, C#, Ruby and Java examples, as well as an Apache Module and Microsoft COM Objects.

GeoPlugin

One of the more flexible remote service is GeoPlugin.com. This one provides the most information from a single line PHP code, but requires a connection to the geoplugin.com server for every request (I recommend caching or using cookies for repeat visitors).

PHP Example:

$fetch = unserialize(fetch_url('http://www.geoplugin.net/php.gp?ip='.$_SERVER['REMOTE_ADDR']));
$geo_code = $fetch['geoplugin_countryCode'];

You can also easily grab the city name using the ‘geoplugin_city’ key in the returned array. Other options supported are Javascript, JSON and XML .

HostIP.info

HostIP.info offers the simplest way to retreive a country code for a given IP address.

$geo_code = trim(fetch_url("http://api.hostip.info/country.php?ip=".$_SERVER['REMOTE_ADDR']));

ipinfodb.com
While not as straight forward as HostIP or GeoPlugin, IpInfoDB.com provides a backup server in the event the first request fails.

$fetch = fetch_url("http://www.ipinfodb.com/ip_query_country.php?ip=".$_SERVER['REMOTE_ADDR']."&output=xml");
if(!$fetch)
	$fetch = fetch_url("http://backup.ipinfodb.com/ip_query.php?ip=".$_SERVER['REMOTE_ADDR']."&output=xml");
 
$fetchb = new SimpleXMLElement($fetch);
if (!$fetchb) 
	$geo_code = "";
else
	$geo_code = $fetchb->CountryCode;

There you have it, four different geolocation services, depending on your needs some may perform better than others, for high traffic sites the Maxmind solution is likely the best one short of paying for a monthly service.

AJAX with JQuery

August 22nd, 2009

AJAX stands for Asynchronous Javascript and XML, though most people would associate the term with dynamic loading of content without refreshing the browser. I notice that even today a lot of people starting out with AJAX still use the XMLHttpRequest object from scratch that has been around forever.

Traditionally retrieving content with such a method would require no less than something like this:

<script type="text/javascript">
var xmlhttp;
function loadXMLDoc(url)
{
xmlhttp=null;
if (window.XMLHttpRequest)
  {// code for Firefox, Opera, IE7, etc.
  xmlhttp=new XMLHttpRequest();
  }
else if (window.ActiveXObject)
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }
if (xmlhttp!=null)
  {
  xmlhttp.onreadystatechange=state_Change;
  xmlhttp.open("GET",url,true);
  xmlhttp.send(null);
  }
else
  {
  alert("Your browser does not support XMLHTTP.");
  }
}
 
function state_Change()
{
if (xmlhttp.readyState==4)
  {// 4 = "loaded"
  if (xmlhttp.status==200)
    {// 200 = "OK"
    document.getElementById('T1').innerHTML=xmlhttp.responseText;
    }
  else
    {
    alert("Problem retrieving data:" + xmlhttp.statusText);
    }
  }
}
</script>
...
<body onload="loadXMLDoc('test_xmlhttp.txt')">
<div id="T1" style="border:1px solid black;height:40;width:300;padding:5"></div><br />
<button onclick="loadXMLDoc('test_xmlhttp2.txt')">Click</button>
...

But with the use of many popular frameworks such as Prototype, JQuery, Mootools, Dojo and others AJAX doesn’t have to be a huge mess for a task so simple. Not only would you get simplicity but better cross browser compatibility as the frameworks take a lot of quirks out of the latest browsers. If a new browser comes out, you can usually just update the framework file to keep all your code compatible.

Here is an example of how the above code could be made simpler with JQuery:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>

Defining a DOCTYPE is very important when dealing with a javascript framework in order to ensure maximum compatibility with most browsers (especially Internet Explorer). I tend to work in xHTML 1.0 99% of the time because I find the best luck with it in a cross browser market. It’s a much stricter document type than HTML 4.01 Transitional, but it tends to work much better.

 
    <script type="text/javascript" src="jquery.js"></script>
 
    <script type="text/javascript">
     $(document).ready(function() {
          $('#T1').load('test_xmlhttp.txt');
 
          $("button.trigger").bind("click", function(e){
               $('#T1').load('test_xmlhttp2.txt');
         });
    });
    </script>
...
<div id="T1" style="border:1px solid black;height:40;width:300;padding:5"></div><br />
<button class="trigger">Click</button>
...

As you can see, not only does JQuery make AJAX loading easy, the selectors are CSS1-3 based making element selection quite easy as well. For folks needing a bit more control on the return data JQuery also has the get(), post() and ajax() functions. More information regarding these functions can be found here : API/1.3/Ajax