Whether you want to learn how to efficiently import data into Firebird, or you wonder if it is worth upgrading to Delphi 2010 for "routine" tasks, this article will give you some food for thought.
In particular, this article explores the use of Delphi and the open-source relational database system named "Firebird SQL" to load and analyze web traffic logs. Along the way, the size and performance of EXEs compiled with Delphi 7 versus Delphi 2010 are compared, plus the speed of Firebird 1.5 versus Firebird 2.1 and four common Firebird data access components are measured.
Fortunately, there are worldwide standards governing the format of web traffic log files. So, if you have a web site that logs traffic data (on any operating system), there is a very good chance that the solution discussed in this article will apply to your situation. We have (so far) tested log files from IIS 5 on Win2000 and IIS 6 on Win2003, with various data fields included and excluded.
The FBPrepLog utility described in this article is free, with Object Pascal source, and is available immediately from Code Central thanks to Embarcadero (look for item #27086). An early version of FBPrepLog is also on the Delphi 2010 Companion DVD. The source definitely compiles in Delphi 7 and in Delphi 2010; it probably compiles in all versions in between.
All SQL code in this article probably works with Interbase but was only tested with Firebird.
Quick Background on Web Traffic Logs
Picture a typical moment in your favorite web browser, just as you go to a web site, and focus on the address bar. The address starts HTTP or HTTPS, right? When you make a web request, an HTTP server, somewhere, processes your request and sends an answer-document that appears in your (client-side) software. Leaving all the complexity aside, that is how it works: request, process, response. It even works that way when you check the train schedule using your iPhone.
For each request, the HTTP server usually writes an entry in a web traffic log to record which file was requested, by whom, whether the request was successful, and so on. Whether those log files constitute a violation of your privacy is another long story which, although fascinating, is outside the scope of this article!
Ordinarily, Windows-based webmasters use Microsoft SQL Server to store and query their IIS web traffic logs, but what if you are using Apache, or would simply prefer to use Firebird SQL?
If you are the owner or webmaster of a web site, you might look at the traffic log files using anything from Notepad to Firebird to a glossy web-stats package to learn what surfers are doing on your web site. Here are some examples of reasons for studying web traffic logs:
- You can find out about errors on your site, including invalid links. Stop link rot!
- You can also find out how much traffic is coming from people versus from software robots. (One typical example of a "bot" is Googlebot, the "spider" used by Google to "crawl" the "web" for keywords, etc.) If you are doing any type of search engine optimization, such as displaying pages slightly differently to web robots than to humans, it can be vital to detect new bots as they come out.
- If you want to verify whether an advertiser is really sending you the traffic that they claim to be sending you, especially if you are paying on a per-click basis, running accurate queries is essential.
These are only a few of the reasons that people wish to have some strong tools for viewing and analyzing web traffic logs.
Microsoft IIS offers an ODBC option for logging the data directly to a SQL database. So I tried that first, with Firebird ODBC. That idea had Two Big Problems:
- I could not make it work;
- In ODBC-mode, IIS only saves a few fields, not the full extended list of fields.
This led to the idea of building a utility to import the .log data into a Firebird database. Microsoft SQL has a bulk import feature; Firebird offers importing from so-called "external files". This external file feature was to be the basis of the utility.
In addition to having fun, I also wanted to quantify the relative speed of two different versions of Delphi, two different versions of Firebird, and several different data access components
Why any difficulty?
If you have not looked at a web traffic .log file lately, you might be wondering what could be so difficult about importing its contents into a database. Surely in 2009, this must be a simple software task!
The two main difficulties are:
- The .log file contains comment rows and data rows, and we only want to import the data;
- The list of fields can vary within the file, whenever the comments indicate a structure change.
To make these difficulties more clear, look at a minimalist .log file (from http://msdn.microsoft.com/en-us/library/ms525807.aspx) with some comment rows , starting with #, then one data row, and then more comments to indicate a structure change and one final data row.
#Software: Internet Information Services 6.0
#Version: 1.0
#Date: 2009-05-02 17:42:15
#Fields: time c-ip cs-method cs-uri-stem sc-status
cs-version
17:42:15 172.16.255.255 GET /default.htm 200 HTTP/1.0
#Software: Internet Information Services 6.0
#Version: 1.0
#Date: 2009-05-02 17:44:00
#Fields: time c-ip cs-method cs-uri-stem sc-status
cs-version cs(User-Agent)
17:44:00 172.16.255.255 GET /default.htm 200 HTTP/1.0
Mozilla/5.0+Ok
Listing 1: Minimal IIS Log File where #Fields comment defines structure of following data
Stripping out the comment fields can be done in many ways, including using the free utility named PrepLog from Microsoft. However, making the importer cope with a structure that can vary an unlimited number of times per document is a bit more challenging.
In web traffic log files, the data fields are delimited by a single space. Data strings, such as the user agent*, where you might ordinarily expect spaces, have those spaces translated to a plus ('+') symbol by the http server software prior to logging.
(*: The "user agent" is the name of the http client software used by the user — commonly a web browser such as Firefox, Opera or Internet Explorer. Finding a user agent of "Mozilla/5.0+Ok" in a log file could mean that a person was there using Firefox, or it could means that some other http client was there, hoping that your web site would give it content as-if it were Firefox. Zie http://en.wikipedia.org/wiki/Mozilla.)
The Goal
If all goes well, we will end up with the web traffic data in a table, in a Firebird database. In figure 1 you can see some of the fields, as displayed by the IB_SQL utility:

Fig. 1: IB_SQL shows some of the IISLOGS table after FBPrepLog did the import
The question is, how quickly can we import the data? Does it matter which Delphi compiler we use, or which data access components? Can we measure a difference between Firebird SQL v1.5 and v2.1? And how exactly does one use the External Files feature?
Our Example Log Files
The examples discussed here use log files generated by IIS 6 on Windows 2003, in Extended log format, made daily, with some but not all optional fields included.
We have also tested the import of log files made by IIS 5 on Win2000.
For testing, we used more than 60 log files, totaling more than 1/2 GB of data, courtesy of traffic data from Earth Wave Records, ranging from July 4th to early September 2009. The web application behind the site was written with Delphi 7 and WebHub. The site uses the ISAPI filter from HREF Tools named StreamCatcher to, among other things, adjust the IIS log files on the fly so that the Session ID (without any random portion) can be readily parsed out by FBPrepLog or human eyes.
The particular log file used for all the performance tests was from 31-Aug-2009 and it size was 12,680 kb.
Runtime Environment
All tests were done on inside a VirtualBox (with thanks to WebCentre Ltd in New Zealand, who allowed us to run tests on one of their backup servers).
| Virtualizer |
Sun VirtualBox v3.0.4 |
| Host OS< |
Windows 2003 Standard |
| Guest OS |
Windows XP Pro with SP3 |
| Guest RAM |
512 KB |
| Guest Disk< |
10MB drive C |
| Guest CPU |
1 |
Table 1: Runtime Environment
Firebird Installation Notes
The tests started with the installation of Firebird-1.5.5.4926-3-Win32.exe, using default options with these minor adjustments on the last page of choices:
[ ] Use the Guardian
[ ] Start Firebird automatically
[x] Install Control Panel Applet
[x] Copy Firebird client library
[x] Generate client library as GDS32.DLL
To test the newest Firebird, I uninstalled Firebird 1.5 and then installed the Firebird-2.1.2.18118_0_Win32.exe file, again using default options.
After installing Firebird, the firebird.conf file MUST be adjusted per the notes in the readme-fbpreplog.rtf file. Otherwise, your output when running FBPrepLog will look like this:
Preparing external LOGBLOCK #0...
Created table...
Exception
Access to external file "C:\FBPREPLOG\___LOGBLOCK___0"
is denied by server administrator
Listing 2: Oops - forgot to modify firebird.conf to grant access to external files
Reset Between Tests
Of course it is helpful to reset the starting point prior to each test. The reset-test BAT file stops the Firebird service, copies an empty database to the official location and restarts Firebird. Note that the empty database file does NOT include the table which will hold the data; that is created on the fly by FBPrepLog.
CD /D c:\FBPrepLog
net stop FirebirdServerDefaultInstance
copy IISLOGS.empty.fb2.fdb iislogs.fdb
net start FirebirdServerDefaultInstance
Listing 3: A BAT file to reset things between tests
Test Results
The test results cover two areas: size and speed.
Size of FBPrepLog.exe
The following table illustrates how much variation there is in the size of the FBPrepLog.exe depending on the type of components used and the compiler used:
- IBX is the built-in Interbase Express driver provided with Delphi;
- IBO refers to the TDataSet-derived version of Interbase Objects, where all components are named TIBO*;
- IBO Native refers to the version of Interbase Objects which descends from TComponent, where all components are named TIB_*;
- dbExpress refers to the new Firebird driver provided with Delphi 2010.
| Delphi 7 |
| IBX |
405 kb |
| IBO (TIBO*) |
1159 kb |
| IBO Native (TIB_*) |
965 kb |
| Delphi 2010 |
| IBX |
827 kb |
| dbExpress |
1420 kb |
Table 2: Size of EXE depending on data access components and compiler
To better understand where the "size" comes from, we can look at the units linked in depending on the test:
program fbpreplog; {Firebird SQL "prep log" utility}
{$APPTYPE CONSOLE}
{$DEFINE TESTIBX} {any Delphi, test Interbase Express}
//{$DEFINE TESTDBEXPRESS} {for Delphi 2010, test dbExpress}
//{$DEFINE TESTIBO} {test TIBOQuery, www.ibobjects.com}
//{$DEFINE TESTIBONATIVE} {test native TIB_Query}
//{$DEFINE HREFTOOLS} {to support ZaphodsMap configuration}
uses
SysUtils, DateUtils,
{$IFDEF TESTDBEXPRESS} // dbExpress
DBXFirebird,
SqlExpr,
WideStrings,
{$ENDIF}
{$IFDEF TESTIBX} // Interbase Express
IBDatabase,
IBCustomDataSet,
IBQuery,
{$ENDIF}
{$IFDEF TESTIBO} // IBObjects
IB_Components, IBODataSet,
{$ENDIF}
{$IFDEF TESTIBONative} // IBObjects Native
IB_Components,
{$ENDIF}
{$IFNDEF TESTIBONATIVE}
DB, // TDataSet used by most access components
{$ENDIF}
{$IFDEF HREFTOOLS}
{*htns*}ZaphodsMap,
// for advanced, flexible configuration
{*htns*}ZaphodsMapMessages,
{*htns*}ZaphodProjectOptions,
{*htns*}NativeXml,
{*htns*}uCode,
{$ENDIF}
Classes
;
Listing 4: Units used in FBPrepLog DPR
Speed of FBPrepLog.exe
Each type of test was run at least three times and the average speed is reported here. The slowest speed is 4 minutes 41 seconds and the fastest is 1 minute 22 seconds.
| |
Delphi 7 |
Delphi 2010 |
| |
IBX |
IBO |
IBO Native |
IBX |
dbExpress |
| Firebird 1.5 |
4min 41sec |
4min 5sec |
4min 11sec |
2min 30sec |
2min 36sec |
| Firebird 2.1 |
3min 25sec |
3min 1sec |
2min 58sec |
1min 23sec |
1min 22sec |
Table 3: Results of Timing Tests
Unfortunately, table 3 does not include any Delphi 2010 test results for IBO nor IBO Native because those components were not available for that compiler at the time of testing. Based on the results with Delphi 7, it seems reasonable to expect IBO's performance in Delphi 2010 to be very similar to IBX and dbExpress.
In any case, the speed differences among data access components is quite small. My guess is that this is because most of the time is spent (a) having the OS write the external file and (b) having Firebird bulk-import. Remember, in FBPrepLog, we are using a very small, rare subset of the data access component's capabilities. Therefore these timing results are not indicative of overall performance; that would require different tests.
The speed improvement from Delphi 7 to Delphi 2010 is quite significant, and Firebird 2.1 is definitely faster than Firebird 1.5
Highlights from the Source Code
The FBPrepLog utility is made from one source code file, FBPrepLog.DPR, with no additional PAS files. There are just under 900 lines of code. Some highlights follow.
To understand the "engine" inside FBPrepLog, you will need some reference to the data structures, which follow in listing 5:
const
cUtilVersion = '1.0.1.4';
LogFieldDBEquivalent:
array[0..29] of
array[0..2] of string =
(
('time', 'fieldTime', '10'),
// ('Field in IIS Log-file', 'Equivalent to field('time')
// in Firebird database').
// 'Field' prefix avoids reserved-words.
('c-ip', 'fieldCIp', '20'),
// snip
('cs(User-Agent)', 'fieldcsUserAgent', '2048'),
('cs(Cookie)', 'fieldcsCookie', '2048'),
('cs(Referer)', 'fieldcsReferer', '2048'),
('cs-username', 'fieldCsUsername', '40'),
// snip
('sc-substatus', 'fieldScSubstatus', '40')
);
AdditionalFieldsCount = 8;
LogDBAdditionalFields:
array[0..1] of
array[0..AdditionalFieldsCount - 1] of string =
(
('cs-uri-query', 'sessionid',
'session=', '&', '15', '', '+', ''),
// SessionId placed in log-field "cs-uri-query". Value
// placed after string "session=" and before delimiter "&"
// This feature is for WebHub sites.
('cs-uri-stem', 'lingvo', '', '', '15', '#', '+',
'/eng#/fra#/rus#/spa#/deu#/ita#/nld#/jpn#/chi')
//# is delimiter
//+/- is to include/exclude
// erwrwe#cxvx - delimited string, erwrwe
// and cxvx are items
// This feature is for any site with translated sub-sites
);
LogBlockMaxSize = 15000;
type
TLogBlockData = record
Date: string;
Index: Integer;
Fields: TStringList;
FieldsLengths: TStringList;
end;
TLogConfiguration = record
dbPath: string;
dbUsername: string;
dbPassword: string;
dbTableName: string;
end;
TFBImportIISLog = class
private
FLogFile: TStringList;
FLogBlock: TStringList;
FLogBlockData: TLogBlockData;
FLogConfiguration: TLogConfiguration;
FdbFieldNames: TStringList;
FdbAdditionalFieldNames: TStringList;
public
constructor Create;
destructor Destroy; override;
procedure LoadConfiguration;
function PrepareLogsExternalTable(InFilename,
OutFilename : string;
var BlockStart: Integer): Integer;
procedure ImportLogBlock(Filename: string);
public
property BlockIndex: Integer read FLogBlockData.Index;
property LogTableName: string
read FLogConfiguration.dbTableName
write FLogConfiguration.dbTableName;
end;
var
{$IFDEF TESTDBEXPRESS}
LogsDatabase: TSQLConnection;
LogsTransaction: TObject = nil; // not used
{$ENDIF}
{$IFDEF TESTIBX}
LogsDatabase: TIBDatabase;
LogsTransaction: TIBTransaction;
{$ENDIF}
{$IF Defined(TESTIBO) or Defined(TESTIBONATIVE)}
LogsDatabase: TIB_Connection;
LogsTransaction: TObject = nil; // not used
{$IFEND}
Listing 5: Data structures and essential variables
The database location, log table name, user name and password parameters all have default values. These values can be overridden by command line parameters.
The type of components needed to connect to the database and manage the external file import vary by test. At the end of listing 5, you can see how TSQLConnection is used in Delphi 2010 with dbExpress, whereas TIBDatabase is used with the IBX components which are available in older versions such as Delphi 7.
When I compile with the {$DEFINE HREFTools} active and link in ZaphodsMap, etc., I can use XML files to configure the same details. That is helpful for me because I have to organize FBPrepLog import processes for many servers. If you are interested in advanced configuration, see TFBImportIISLog.LoadConfiguration in the source and pay a visit to the ZaphodsMap web site.
The function TFBImportIISLog.PrepareLogsExternalTable transforms the data in the .log file into another file which is suitable for bulk-import into Firebird.
The procedure TFBImportIISLog.ImportLogBlock does the "real work", and it includes the definition of all the SQL that is used for all steps in the process. Some steps are optional (such as creating the IISLOGS table, which may already exist). Please see the FBPrepLog.dpr file if you are interested in how this works -- look for procedure TFBImportIISLog.ImportLogBlock. Listing 6 shows certain key lines in bold to show the overall flow.
procedure TFBImportIISLog.ImportLogBlock(Filename: string);
//snip
case SQLId of
1:
//snip
12: begin
Clear;
Add('CREATE TABLE ' + 'Temp' + ‘ EXTERNAL FILE ''' +
Filename + ''' ');
//snip
14: begin
Clear;
Add('INSERT INTO ' + LogTableName + ' (');
//big snip
try
GetSQL(1, SQL); // step 1
Open;
TableExists := not Eof;
Close;
//snip -- code for creating table, trigger, generator
GetSQL(14, SQL); // step 14
ExecSQL;
//snip
Transaction.Commit;
//snip
GetSQL(11, SQL); // step 11
ExecSQL;
//snip
LogsDatabase.Connected := False;
//snip
Listing 6: ImportLogBlock
The rest of the FBPrepLog source code (other than ImportLogBlock) is involved with (a) showing usage syntax in case someone needs to see the parameters allowed before running it, (b) looping as needed when importing all files in a directory and (c) timing the import process.
Data Mining
To get a rough idea of the amount of traffic by day, we can use the query shown in listing 7. This includes requests by humans and web robots.
SELECT a.FIELDDATE, count(a.ID)
FROM IISLOGS a
group by a.FIELDDATE order by a.FIELDDATE
Listing 7: Query count of requests by date
FlameRobin or IB_SQL (or a custom Delphi application) can be used to display the results. If FlameRobin is used, some reassuring information is displayed while the query runs, as shown in the figure below.

Fig. 2: FlameRobin shows status info prior to running the query
To find out whether we have any bad links on the site, we can search for hits where the status code is 404 (file not found). Of course, some of those hits will probably be from hackers trying to get in by trying vulnerable resources which yield access on some systems. In figure 3, you can see some of these "rotten links" and who has them (the Referer is the URL which led to the request).

Figure 3: FlameRobin shows invalid requests and who sent them
To count the number of links from Google, we can count the hits where the referer starts with http://www.google.com (and that will include google.co.jp which is relevant for this web site).

Fig. 4: FlameRobin shows people arriving from Google
On a site that uses WebHub or for any reason includes a session id in the URL, we can follow people (anonymously) and find out which pages they view, and from that perhaps we can determine which pages to improve. The example shown in figure 5 helps us focus on hits from people who were sent by Google, to find out whether they stayed around to view more than 1 page after arriving. The SQL is a bit interesting and is shown in listing #8. The substring function is used because the SessionID field is varchar(128) and only the first 15 characters are signficant. (Yes - we have shortened that field since taking these screenshots.)

Fig. 5: FlameRobin shows result of query checking whether people arriving from Google "stick" on the web site
If this query is run on a site and shows mostly 1's in the Count column, that would indicate that arriving surfers did not immediately see what they wanted, and they did not feel like following any links within the site. If these had been paid (adwords) links from Google, this would indicate an expensive problem, because all those click-throughs would represent wasted advertising.
select B.FIELDDATE,
Substring(B.SESSIONID from 1 for 15) as SESSIONIDCUT,
count(B.ID)
from IISLOGS B,
(
select A.FIELDDATE,
Substring(A.SESSIONID from 1 for 15) as SESSIONIDCUT
from IISLOGS A
where (A.SessionID > '1001')
and (A.fielddate >= '2009-09-01')
and (A.FIELDCSREFERER STARTING WITH 'http://www.google.')
) GOOGLERS
where (SUBSTRING(B.SESSIONID FROM 1 for 15) =
GOOGLERS.SESSIONIDCUT)
and (B.FIELDDATE = GOOGLERS.FIELDDATE)
group by B.FIELDDATE, SESSIONIDCUT
order by B.FIELDDATE, SESSIONIDCUT
Listing 8: Nested query to count hits on the same session number for Googlers
This is just a sampling of the data mining that is possible once the data is in a database. Different sites have different requirements, and it is possible for anyone to do their own analysis if they are willing to put in the effort to figure out the query syntax. The SQL Knowledge Base compiled by IB Phoenix was helpful to me while I worked on the queries shown here.
For low-traffic sites, you might not need any queries. Simply having the traffic in an organized format makes it possible to read through the entries and see important patterns. If you have a do-it-yourself site, that might be sufficient.
Quick Tips
- For prototyping queries, consider installing both FlameRobin and IB_SQL. They each have their strengths.
- Use the fastest machine available for Firebird SQL. Log databases become large quickly and you will save yourself a lot of time by using a fast machine.
- If you decide to use FBPrepLog, test it once on a relatively small .log file and review the resulting IISLogs table. Then, go back to the source code and consider changing some of the field names to be shorter (easier to type and recognize), and consider making some field lengths shorter. Then, recreate an empty database and start using FBPrepLog daily or hourly.
- If you have other data transformations that you would like to accomplish during import (such as tracking the lingvo in its own field rather than just in the uristem), study the code which fills in the SessionID and Lingvo and use that as a basis for your own customizations.
Enjoy!
International Character Sets
The examples in this article used English web sites and URLs. It is possible, with IIS6 on Win2003 and later, to log requests using UTF8, allowing for many important non-western lingvos.
The FBPrepLog utility would need to be modified to support UTF8. The logic within the external file processing would need to change, and importantly, when you create the Firebird database to hold the traffic data, you would have to specify the charset as "UTF8", rather than "None." Both IB_SQL and FlameRobin make it easy to select the charset when the database is first created.
It is very likely that by the time you read this article, the coding changes will be complete and included in FBPrepLog as posted on Code Central.
Products and Other Things Worth Finding
To find any of the products or services mentioned in this article, please see table 4. All URLs were valid as of September 3, 2009.
| Product |
URL |
Notes |
| FBPrepLog Source |
cc.embarcadero.com |
Use of Code Central is free as long as you have registered with the Developer Network. Look for item #27086. |
| Firebird SQL |
www.firebirdsql.org/index.php?op=files |
Free, open-source |
| Firebird Foundation |
www.firebirdsql.org/index.php?op=ffoundation&id=contributions |
Any $ donation will be appreciated and invested per priorities which are open to discussion and community input. |
| Delphi 2010 |
www.embarcadero.com |
Not free; talk to your local Embarcadero sales rep for upgrade pricing and deals. |
| Firefox |
www.mozilla.com/en-US/firefox/all.html |
Free; available in many lingvos |
| FlameRobin |
www.flamerobin.org |
Free admin and query tool |
| Hosting |
www.webcentre.co.nz |
Colo server |
| IB Objects |
www.ibobjects.com |
Interbase Objects components (free eval; trustware license) |
| IB_SQL |
www.ibobjects.com/ibo_ib_sql.html |
Free admin and query tool |
| IIS6 enable UTF8 |
www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/bea506fd-38bc-4850-a4fb-e3a0379d321f.mspx?mfr=true |
Topic: Log File Formats in IIS (IIS 6.0) |
| Knowledge Base |
www.ibphoenix.com |
Use [Search KnowledgeBase] link, bottom-left of home page, for excellent information about many aspects of Interbase and Firebird SQL |
| Mozilla |
en.wikipedia.org/wiki/Mozilla |
History lesson |
| Opera |
www.opera.com/mini/download |
Alternative to the alternative. |
| RFC for logs |
www.rfc-editor.org/rfc/rfc2616.txt |
Defines standard for web traffic .log files |
| StreamCatcher |
www.streamcatcher.com |
Commercial ISAPI filter for remapping; can modify IIS logs on the fly. |
| Vintage vinyl records |
www.earthwaverecords.com |
Supplied the traffic logs |
| Virtual Box |
www.virtualbox.org |
Free, open-source... from Sun |
| Warehousing and Distribution |
www.csweb.biz |
Courier Services co-sponsored the development of FBPrepLog to help the Firebird Foundation |
| WebHub |
www.href.com/webhub |
Commercial product; a web development framework for Delphi for sites that are to be translated, skinned or otherwise deployed repeatedly. |
| ZaphodsMap |
www.ZaphodsMap.com |
Free, open-source, configuration subsystem from HREF Tools |
Table 4: Product Placement
Conclusion
Were you surprised to find so little difference in speed between the data access components? I certainly was, until I realized that the whole point of doing a bulk import is that the Firebird engine does all the work, rather than the components.
Finding out that Delphi 2010's compilation of the same source code (e.g. the "IBX" test) led to an EXE that was larger, yet 45% faster, was a wonderful outcome and has convinced me to start recompiling all my utilities with Delphi 2010.
Finding out that Firebird 2.1 was faster the 1.5 was not a surprise, although I was glad for the confirmation. Now I need to chase down my last remaining copies of Firebird 1.5 and get them all upgraded!
Thank you very much for reading this article, and please consider donating at least $10 to Firebird Foundation if you use FBPrepLog.
Grazie. Danke sehr. Merci. Gracias. Dank U!