Synchronizing Databases with a Web Service (Part 2)
This is the second in a two-part article, the first of which – to be found on the SDN-website! - looked in general at the use of web services to deliver data in usable formats, and the various technologies you can use within the .NET world to consume this data. This article extends the concepts discussed there, by building a more complex parameterized web service and a Windows Forms client that can consume this web service.
In a previous article, we examined how we can easily expose data through an ASP.NET web service, in a format that is useful for all kinds of clients. By exposing a .NET dataset object, we effectively deliver a standard format XML document that completely represents the contents of that dataset. This is called a diffgram in the .NET world, and can be used to reconstruct the original dataset complete with identical columns, data types, columns metadata, primary and foreign keys, and even the relationships between the tables in the dataset.
An Overview of the Synchronization Process
In this article, we develop on the principles we discussed in the previous article to build a utility that allows the contents of two or more databases to be synchronized across the Internet, by delivering the rows that have been added to each one to the other database server(s). The principle is relatively simple:
- Each database has the same set of tables, with identical structure, for the tables that will be synchronized.
- Each table has a column containing a value that identifies that database or machine, so that rows can be identified depending on which machine they originally came from.
- Each machine exposes the same web service that can deliver a dataset containing the rows to be synchronized to any of the other machines.
- Each machine calls this web service on all the other machines or a regular basis, fetching new rows from each one and adding them to its own database.
The schematic in figure 1 shows the overall process for two machines. The important point to note is that the process must avoid collecting all the new rows from the other machine(s). By specifying the value for the column that identifies each machine when querying the web service, we ensure that only the new rows that have been added to that machine are returned. If we fail to do this, the first machine will copy the new rows from the second machine, and then - when the second machine fetches the new rows from the first machine - it will collect the rows it previously sent to the first machine as well. This will cause duplication of the new rows.

Fig. 1: The synchronization process in outline
Remember that you can usually connect to a database server that is running on a different machine from your web server, generally this is when they are on the same network segment. This means that you can store and extract the data you want to synchronize from separate database servers that are connected to the web server that runs the web service or the server that runs our custom client application.
Choosing a Client Platform for the Synchronization
The web service we use is implemented in ASP.NET, as demonstrated in the previous article. The next question is, what do we use to build the client utility that will consume the web service and update the database on each machine?
To be able to handle the dataset the is exposed by the web service, we obviously require a .NET-aware client, and so the choices come down to using an ASP.NET Web Forms page, a console application, a Windows service, or a Windows Forms application. However, we want the process to be "automatable", in other words we want it to be executed at specified intervals without human intervention. This basically rules out an ASP.NET Web Forms page.
Any of the other three types of application will work, though we chose to implement it as a Windows Forms application. This can provide a better user experience when setting up and experimenting with the synchronization process, without subjecting the user to the limitations of command window output. It can be run on a regular basis using Windows Task Scheduler, under any account that you specify (not necessarily the currently logged-on user).
The other possible option is to build it as a Windows service, which runs continuously on the server without requiring a user to be logged onto the machine. However, this makes presenting visual feedback for testing and setting up more difficult. It also has the disadvantage that it consumes server resources all the time, when the replication process might only take place occasionally.
In the example we're showing here (synchronizing the web server log file summary data), the process only takes place once a week, and runs for only a couple of minutes, so having the application permanently loaded and running as a Windows service seems to be overkill. However, if you choose to build the client as a Windows service, there is no requirement for a user to be logged on at the time that it runs.
The Synchronization Example Database
On our own websites we maintain logs of traffic using both IIS logging (via ODBC to a database), and custom logging of ASP and ASP.NET session starts that stores details of the browser's user agent string, language setting, and any referrer details (the URL of the page containing a link they clicked to get to our site). This data is summarized once a week. Stored procedures within the database populate a series of six tables with the totals for each type of data that we want to collect.
Figure 2 shows the six tables, plus the extra table named LangCodes that provides a text equivalent of the language code that the browser exposes, for example "en-us" which represents "English (United States)". As the design of the database has matured over time, the way that the date for each row is stored has changed from being a datetime column (as in the PagesByWeekSummary and TrafficByHourSummary tables) into a two separate int columns (as in the remaining tables).

Fig. 2: The table used in the synchronization example
Using the data in these tables, we provide information on visitor patterns and page popularity to administrators, and we also expose some of the data in graphical form to visitors to our site. You can see this on the "traffic" pages at http://www.daveandal.net/traffic.asp. Figure 3 shows some examples.

Fig. 3: The graphical displays created from the site log database tables
PS: You can download the code we use to create these charts from the "traffic" page at http://www.daveandal.net/traffic.asp.
The task, therefore, is to synchronize the six tables across all the machines that contain copies (mirrors) of the site. In fact we only operate two sites, but the process and example application you'll see in the rest of this article has been designed to manage more than just two sites.
The Unique Site Identification Key
As we mentioned earlier, one important factor is that each database must include in all the synchronized tables a column that identifies the machine that originally collected the data and added it to the tables. In our example, the key is the IP address of the host server - the machine that contains the site - and this is stored in the TSiteIP or HostIP column of each table.
Therefore, the web service that exposes the data for each site can select only the rows that were added to the tables by that site, and ignore any that were previously fetched from other servers that take part in the synchronization process.
The Synchronization Web Service
The web service we use for this example is pretty much the same in concept and outline as that we looked at in the previous article. The Class file (named getlogstable.asmx in the winforms-app\WebService folder of the examples) has the following format:
<%@WebService Language="VB" Class="SynchIISLogs"%>
Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Web
Imports System.Web.Services
Imports System.Configuration
Namespace:=_
"http://yoursite.com/web services/synchiislogs"_
)> Public Class SynchIISLogs
... class implementation here ...
End Class
You can see that we import the various namespaces we'll need (our database is in SQL Server so we are using the data access classes from the SqlClient namespace). We store the database connection string in web.config, so we need to import the System.Configuration namespace to be able to access this.
Deciding How To Return the Data
In the examples we looked at in the previous chapter, the dataset we returned from our example web services contained only a single table. However, there is no reason why we can't add more than one table to the dataset before returning it to the client application. There are downsides with this approach, though - it means that if an update fails you have shifted a lot of data that might not be used. By fetching each table separately, and halting the process when the first failure occurs, you reduce the bandwidth requirements. Using smaller a dataset also minimizes the risk of time-outs in the web service.
On the other hand there is an issue with the possibility of partial updates causing data errors. If we copy rows from one table, but then an error occurs in another table (or in rows encountered later in the first table), the target database will not contain a valid set of rows. The ideal solution to this is to use a connection-based transaction in the client application, so that we can roll back all the updates to all the tables if any error should occur. But this means keeping the connection open throughout the whole process, rather than closing it after each individual table has been fetched and pushed into the database.
In the end, we decided to implement a process that can use multiple tables, but doesn't have to. The web service we demonstrate here can accept a comma-delimited string of table names, and will build and return a single dataset containing the appropriate rows from all these tables. If there is only one table specified, that's all the code will insert into the dataset.
By allowing the client to collect all the rows for all the tables that they need to synchronize in one go, we make it possible to build efficient client-side applications that can still hold the connection open and perform all the updates to all the tables within a transaction. Then, after all the updates have succeeded they can be committed - or all the additions to all of the tables can be rolled back if there is an error.

Fig. 4: The service description page for the SynchIISLogs web service
The GetLogTables Method of the Web Service
Figure 4 (above) shows the service description page of the web service. As you can see, it exposes two Public Web methods. The first of these (named GetLogTables) is used to fetch the data. It accepts three parameters: the IP address of the server whose data should be returned (this is the key to the rows that indicates which server they came from), a comma delimited list of table names to extract the data from, and the date that the last synchronization took place. The dataset that this method returns will then only contain rows that were generated on that host machine since the last synchronization:
"Returns a dataset of new rows from the _
specified tables" )>
Public Function GetLogTables(_
sHostIP As String, _
sTableNames As String, _
dLastUpdate As Date) As dataset
Dim oConn As SqlConnection
Try
' only allow queries for data up to six months ago
If DateTime.Compare(_
dLastUpdate.AddMonths(6), DateTime.Now) < 0 Then
Throw New Exception(_
"Cannot query for data more than six months old")
End If
' create new dataset and create array of table names
Dim oDS As New dataset("IISLogResult")
Dim aTables() As String = sTableNames.Split(",")
' create Connection and DataAdapter with empty
' SelectCommand text
oConn = New SqlConnection(_
ConfigurationSettings.AppSettings("IISLogs"))
Dim oDA As New SqlDataAdapter("", oConn)
'rows must be marked as "added"
oDA.AcceptChangesDuringFill = False
oConn.Open()
' iterate through list of tables filling dataset
' from database
Dim sTable As String
For Each sTable In aTables
' change the SQL statement by setting
' the CommandText property of the Command object
' already attached to the DataAdapter
oDA.SelectCommand.CommandText =_
GetSQLStatement(sHostIP, sTable.Trim(),_
dLastUpdate)
' fill this table in the dataset
oDA.Fill(oDS, sTable.Trim())
Next
Return oDS ' and return the dataset to the client
Catch oErr As Exception
Return GetErrordataset(oErr.Message)
Finally
oConn.Close()
End Try
End function
Notice that we check for a date more than six months old in our example. As there are a considerable number of rows for each week in some of the tables, we chose to do this to prevent overloading the server if an obviously incorrect date is provided. You can change this to place a different limit on the dates, or even remove it altogether. Then we create a new empty dataset, giving it the name IISLogResult, and split the list of tables into a String array. The next step is to create a Connection and a DataAdapter object in the usual way, open the Connection, and then iterate through the array of table names pulling the appropriate set of rows into the dataset so that it contains a DataTable object for each of the tables in the list.
We chose to manually open the connection before calling the Fill method more than once, and then ensure its closed again in the Finally section of our Try...Catch construct. The Fill method will automatically open and close the connection if it's closed when the method is called, but we get marginally better performance by opening and closing it once only instead of multiple times.
Getting the Correct DataRowState Values for the Rows
However, there is one important issue to grasp here. When we reconstruct the dataset on the receiving machine, we want to be able to use the Update method of an attached DataAdapter to push the rows into the matching database table on that machine. The tables have the same structure, so this isn't a problem. However, the Update method will only push the rows into the table if they are marked as being "added" or "new" rows.
Each row in a table (DataTable instance) within a dataset has a RowState property that is set automatically when the values in that row are changed. The row stores the original and the current values of each column in the row, and so the Update method can tell which rows have changed (i.e. which have been added, modified or deleted), and what the original values and the current values are. So we must ensure that all the rows we pass back to the client in our dataset tables have the value DataRowState.Added for their RowState property.
Normally, by default, the Fill method of the DataAdapter loads the rows into the DataTable instance and then calls the AcceptChanges method for each one. This means that their original and current values are the same, and the RowState is set to DataRowState.Unchanged. So, in this case, the rows will not be pushed into the target table on the client machine. However, by changing the AcceptChangesDuringFill property of the DataAdapter to False from its default value of True, before we call the Fill method, we prevent the AcceptChanges method from being called automatically. The result is that the dataset will have each row still marked as "added":
oDA.AcceptChangesDuringFill = False
Building the SQL Statement
As the code in the GetLogTables method iterates through the list of tables, it uses a separate Private function that creates the SQL statement to extract the required set of rows from the database. The GetSQLStatement function takes three parameters (the host IP address, the name of the table currently being processed, and the date that the last synchronization took place). It uses these values to build up the appropriate SQL statement. Notice how we have to use a GregorianCalendar instance to figure out the week number from the DateTime value that we pass to this method. This is required because, as you saw earlier, some of the tables contain a year and week number instead of a datetime column:
Private Function GetSQLStatement(_
sHostIP As String, _
sTableName As String, _
dLastUpdate As Date) As String
Dim oCal As New System.Globalization.GregorianCalendar
Dim iYear As Integer = dLastUpdate.Year
Dim iWeek As Integer =_
oCal.GetWeekOfYear(dLastUpdate, Nothing, Nothing)
Dim sSQL As String =_
"SELECT * FROM " & sTableName & " WHERE "
If sTableName.ToLower() = "pagesbyweeksummary" _
Or sTableName.ToLower() = "trafficbyhoursummary" Then
sSQL &= "DATEDIFF(day, '" _
& dLastUpdate.ToString("yyyy-MM-dd") _
& "', TSumDate) > 0 AND TSiteIP = '" & sHostIP & "'"
Else
sSQL &= "((TYearNumber = " & iYear.ToString() _
& " AND TWeekNumber > " & iWeek.ToString() _
& ") OR (TYearNumber > " & iYear.ToString() _
& ")) AND HostIP = '" & sHostIP _
& "' ORDER BY TYearNumber, TWeekNumber"
End If
Return sSQL
End Function
The GetWeekOfYear method takes three parameters. The first is the date to work from, and the other two are used to specify a CalendarWeekRule value (which indicates how the first week of a new year should be determined) and a DayOfWeek value (which determines what day is the first each week). By using the value Nothing (null in C#) for these two parameters, we take advantage of the defaults depending on the culture and internationalization settings of the machine, which should match the values used by ASP.NET and the database server software. For more details check out the GregorianCalendar class in the System.Globalization namespace section of the .NET SDK Reference section.
Returning Error Details
There is also a separate Private function in our web service class that creates a dataset containing error details, if an exception is raised within the web service. We looked at this routine in the previous article, and it's simply listed here again. You'll see how we use this "error" dataset when we look at the client application later in this article:
Private Function GetErrordataset(sMessage As String)_
As dataset
Try
Dim oDT As New DataTable("Errors")
oDT.Columns.Add("Message",_
System.Type.GetType("System.String"))
Dim oDR As DataRow = oDT.NewRow()
oDR("Message") = sMessage
oDT.Rows.Add(oDR)
Dim oDS As New dataset("Error")
oDS.Tables.Add(oDT)
Return oDS
Catch
Return Nothing
End Try
End Function
Figure 5 shows the test page that ASP.NET generates automatically for the GetLogTables method when you select it in the main service description page (which was shown in figure 4).

Fig. 5: The test page generated by ASP.NET for the GetLogTables method
When invoked, a diffgram such as that shown below is returned. We've removed the schema details and some of the repeated summary data elements to make it easier to see the structure. The important point to note is that the data rows are marked as "added". You can see this from the diffgr:hasChanges="inserted" attributes on the , and elements:
"http://www.stonebroom.com/web services/synchiislogs">
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
... schema located here ...
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"
xmlns:diffgr=
"urn:schemas-microsoft-com:xml-diffgram-v1">
msdata:rowOrder="0"
diffgr:hasChanges="inserted">
2004
2
9980
275442
3206
217.45.1.242
... more WeekSummary elements here ...
diffgr:id="CountrySummary32"
msdata:rowOrder="31"
diffgr:hasChanges="inserted">
2004
2
fi
3
217.45.1.242
... more CountrySummary elements here ...
diffgr:id="RefererSummary1019"
msdata:rowOrder="1018"
diffgr:hasChanges="inserted">
2004
5
http://www.google.ca/search
38
217.45.1.242
... more RefererSummary elements here ...
The TestSQLStatement Method of the Web Service
This second Public Web method of the web service is actually a debugging tool that we decided to leave in place to help when testing the service and client application. This method simply returns as a String the SQL statement that would be used to extract the data when calling the web service with that specific set of parameter values. It just calls the GetSQLStatement function we saw earlier with three parameters. Notice that this method only accepts a single table name:
"Returns the SQL statement for testing"_
)>
Public Function TestSQLStatement(_
sHostIP As String,_
sTableName As String,_
dFromDate As Date) As String
Return GetSQLStatement(sHostIP,sTableName,dFromDate)
End Function
The result of calling the TestSQLStatement method can be seen in figure 6. Because this table is one that contains a year and week number, the code in the GetSQLStatement function has generated a WHERE clause that specifies the appropriate year and week values:

Fig. 6: The result of calling the TestSQLStatement method of the web service
The Windows Forms Synchronization Client Application
The next step is to build the client application that will consume the Web service and update the database on the target machine. This application requires several items of information:
- The URLs of the web service on all of the machines from which it will fetch new rows
- The values of the keys that identify the rows added to the tables by each of the source machine (in our example this is the IP address of the source Web sites)
- The connection string of the local "target" database into which the rows it fetches will be pushed
- A list of the tables that it will synchronize for each machine
- The dates that the last synchronizations took place, so that it can fetch only the rows that were added since then. These dates might be different for each of the source machines from which it collects its rows.
We also want to generate a log file, because Windows Task Scheduler will run the process automatically on a weekly basis. We don't want to pop up error dialogs, or just display status information within the application where it will be lost when the synchronization process ends each time. So we also store the path and name of a log file.
This concludes the 1st part of the client side story. In the next part – be it in the next magazine or on the website – we will display the last part of this article. We’ll then show an example of the web service synchronization process in real action. So stay tuned!