This is where you will read the text log file from your Apache server and also set up your filtering.
You will be asked for three bits of information. First you will be asked whether to overwrite or append to the current log data. If you choose overwrite all the current data will be thrown away except for your filter and view settings. If you wish to keep these settings then this is the best way to create a new log file and follow it with a Save As. Secondly you will be asked whether you wish to restrict the records read in with a date range. If the start date is left blank the log data is read from the beginning. The start date can be automatically set to read from the time of the last record in the current file. If the end date is left blank it is read until the end. If both are left blank the entire data set is read in. Finally you will be asked to select the file to read in. This should be an uncompressed text file from an apache server.
Note there are different formats for the apache log file the two most common are "Common log format" and "Combined log format". The only real difference between them is that the common format does not include the referrer or the user agent. If none of your records have a referrer domain (they all say "- (no referrer)" then you probably have the common log format set up. If you want this information (recommended as without it you have no access to the referrer domain or search string information) then if you have control over the server then this site explains how to set up the combined log format: http://httpd.apache.org/docs/logs.html. Otherwise you will have to ask the support people in charge of the server if they can change the format for your logs.
Whilst the file is being read a report of progress is provided. Note this is currently fairly slow and work will be done to try and speed things up. On my system a 30M file takes about 4 minutes. However once you have read the data in and saved it as a .laf (Log Analysis File) this file should be less than half the size of the original file and should read in a fraction of the time.
This is the most complex part, but is still really very simple. A fairly large dialog is presented with several main options. In each of the list boxes you can select one or more items using the usual ctrl or shift and mouse click combinations. All the filter options and selections are saved in the .laf file and in the future I will allow sets of options to be named and recalled later. Note if your filter is quite loose (ie letting in a very large number of records) then a Full View display will be very slow to draw (think about it there may be several hundred thousand records to draw). I would like to try and fix this in the future but for the time being if you try to view Full View with more than 5000 records then a warning is given giving you the opportunity to switch to a summary view and maybe tighten up the filter a little.
All - check this to view all records. All other options will be un-checked but their data is retained.
Include Files or Exclude Files - The list box contains a list of all the files found in the log file. If you select Include only records accessing the selected files will then be displayed and the Exclude option will be un-checked. If you select Exclude any records of accesses of the selected files will be excluded from the displays and the Include option will be un-checked.
Include or Exclude Referrer Domains - The list box contains a list of all the referrer domains found in the log file. The special domain name " - (no referrer)" identifies records that had no referrer provided for the access. These are probably mostly direct accesses rather than links from other sites - bookmarks, typed in directly etc. though some may be from sites that hide their identity (see also the note on under Read Log above). If you select Include only records of accesses from the selected domains will be displayed and the Exclude option will be un-checked. If you select Exclude any records of accesses from the selected domains will be excluded from the displays and the Include option will be un-checked.
Include or Exclude HTTP Codes - The list box contains a list of all possible HTTP access codes (I think I've got them all - most are very rare). Note: in the View Menu you can choose whether these are displayed as numeric codes or text descriptions of the code If you select Include only records of accesses returning the selected codes will be displayed and the Exclude option will be un-checked. If you select Exclude any records of accesses returning the selected codes will be excluded from the displays and the Include option will be un-checked.
Exclude Images - All accesses to image files will be excluded from the displays. Image files are identified by their file extension and by default only the more common extensions are used - jpg, jpeg, gif, bmp, png, ico - others can be added to this list (see Options). I strongly recommend use of this exclusion as every image on a page that is viewed will be recorded as a separate access in the log file and can easily swamp your records making it hard to see what is really going on. This is the default value for this option. More image types could have been added but this would have slowed filtering down and these are the most common. Note an image cannot also be a page.
Files Only - Include only requests that resulted in a file being returned.
Pages Only - Include only requests for pages. see Options for a list of page file extensions. Note an image cannot also be a page.
Date Range - Specify a date range to view. The two pull down boxes all quick selection of just a month and year. The date fields are currently always dd/mm/yy format. I will provide an option to have mm/dd/yy at some point in the future. The date range is inclusive so only records from 00:00:00 on the first date up to and including 23:59:59 on the second date will be included in the displays.
All these filter settings can be saved as named filter sets using the New, Edit and Rename buttons at the top of the dialog.
Use this option to refresh the databases stored analysis details. This includes things like flags indicating whether it is a page, image and the search strings if the record is an access form a search engine. You should do a refresh after doing something like changing the search engine file (see View - Manage Views and the search heading on that page). If you change the file extension settings for specifying pages or images you will asked if you wish to do a refresh (recommended).
You can select records and delete them from the log. Note they are permanently deleted there is currently no undo. If deleting a very large selection of records this can take a little while. Occasionally if there are multiple identical records in the selection then only one of them will be deleted. This can occur only if the same IP address makes an identical request within the same second. This may occur for example with some download programs that download files in multiple parts.
A delete date range option may be added sometime in the future.