Usage Statistics for Virtual Hosts on rdwarf.com

Virtual hosts on rdwarf.com have information collected as the server runs, which allow the calculation of some interesting information. This data can be gleaned from the web server logs, and processed in to graphs and statistics about what was most interesting. All the virtual hosts on rdwarf.com use a single web server, so all the logs are all gathered and broken in to chunks in the central location. There is an automated process set up where the system will go through the collected logs, and update usage statistics. A sample of the sort of output produced can be seen at http://www.rdwarf.com/usage/.

Engaging the Scripts

Each site can choose if it wishes to have the usage statistics generated for it, or not. If a site is not interested, then the processing script will ignore it. Even if the site does not have usage graphs made, the data for them is collected automatically.

To inform the central processor that a site wishes to have usage information built, the site administrator should create a "usage_build" subdirectory in the root of their host's space. This directory should be set to permissions 0770 with the chmod command. (From the shell: chmod 0770 usage_build) This means the owner, and the group of the virtual host can modify the directory and see its contents, but no one else can.

The usage_build directory is used by the scripts as a temporary location to create and manipulate files. It stores files there that it needs from month to month, and to ensure correct operation, those files should be left alone.

Script Operation

Once monthly, the scripts run. If a site has created a usage_build directory, then the scripts do their work, collecting and analyzing log data, and storing that information in usage_build. The readable information from it is moved to a directory called usage, in the root of the virtual host's directory space.

If you have created a usage_build directory, the scripts take this to mean that they have permission to do anything they like with the contents of the usage and usage_build directories. Do not place files in these directories that you intend to keep, for they may be removed by the normal script operations. Despite the scripts rather proprietary attitude about the contents of those two directories, they make an effort to make sure the files are owned by the administration of the virtual host, so that the administrator can remove or change them as they see fit.

Configurable Options

The scripts use a tool called webalizer to do the actual generation of the graphs and HTML for the usage pages. This tool has some interesting configuration options, and a site administrator can affect the items recorded on the pages generated, and the general look of those pages, such as color and font.

To do this, the administrator should create a file called webalizer.conf in the usage_build directory. The file needs to contain the configuration information for webalizer. If the administrator has not provided one, the default file /etc/webalizer.conf is used. That file would make a good base to begin with, to make changes to, as it has sensible values in it, and they are documented there.

One important thing the administrator must change is the HostName option. It must be set to the name the administrator wishes used on the pages, because the default is "www.rwdarf.com" which is probably not what anyone but the system wants. When the scripts see that they are to use the default configuration, they make a point to provide an at least acceptable name to appear on the pages, usually the virtual host name, such as www.example.com. When the scripts notice that the administrator is using their own configuration file, they assume that file has the correct options in it, and set none. The administrator must make sure the file is configured properly.

The full details of what can be configured and how are available on the webalizer manual page. (From the shell: man webalizer)

Permissions and Security

The usage directory created by the scripts should be visible to users from the URL http://www.example.com/usage/. This is a somewhat standard location which users might guess or expect this information to be located at.

If you do not wish this information to be publicly available, you can put a password on it, by creating a .htpasswd and .htaccess file in the usage directory. The scripts will not affect those, and they will allow you to control who has access to this information. More information about how to work with .htaccess files is available at http://www.rdwarf.com/dakini/password-protected.html. Everybody email dakini@rdwarf.com and thank her for turning Lou's geeky scribblings in to something sensical.