Implementing Master-Slave Transactional Incremental Replication Using SQL Server

 

? Data Column: select for more on pattern organization Application Column: select for more on pattern organization Deployment Column: select for more on pattern organization Infrastructure Column: select for more on pattern organization
Architecture Row: select for more on pattern organization Data Architecture: select for more on pattern organization Application Architecture: select for more on pattern organization Deployment Architecture: select for more on pattern organization Infrastructure Architecture: select for more on pattern organization
Design Row: select for more on pattern organization Data Design: select for more on pattern organization Application Design: select for more on pattern organization Deployment Design: select for more on pattern organization Infrastructure Design: select for more on pattern organization
Implementation Row Data Implementation: select for more on pattern organization Application Implementation: select for more on pattern organization Deployment Implementation: select for more on pattern organization Infrastructure Implementation: select for more on pattern organization
? Complete List of patterns & practices Complete List of patterns & practices Complete List of patterns & practices Complete List of patterns & practices

Version 1.0.0

GotDotNet community for collaboration on this pattern

Complete List of patterns & practices

 

 

Context

 

You want to build a Master-Slave Transactional Incremental Replicationbetween two Microsoft SQL Server databases, and you want to take advantage of the integrated replication functionality of SQL Server.

The schemas of the source and target do not need to be identical; simple manipulations as defined in the Data Replication pattern can be implemented in the replication link. In addition, horizontal (row) and vertical (column) filters can be used to restrict the replication set.

 

Note: This pattern uses terms and concepts from the following data patterns:

  • Move Copy of Data
  • Data Replication
  • Master-Slave Transactional Incremental Replication
  •  

    Background

     

    Before introducing the implementation with SQL Server, this pattern covers the following topics:

  • A summary of the replication building block, as described in the Master-Slave Transactional Incremental Replication pattern
  • The SQL Server replication services that this pattern uses (SQL Server transactional replication).
  • The mapping of the elements of the replication building block to the services of SQL Server

     

    Replication Building Block

     

     

     

    The replication building block used in the Master-Slave Transactional Incremental Replication pattern consists of a source and a target that are connected by a replication link, as Figure 1 shows.

    Figure 1: Replication building block

     

    Source

     

     

     

    The source contains the replication set, which is the data copied from the source and sent across a data movement link to the target.

     

    Acquire

     

     

     

    The Acquire service reads the data changes that are replicated.

     

    Manipulate

     

     

     

    The Manipulate service performs simple data transformations, such as data type conversions and splitting or combining columns.

     

    Write

     

     

     

    The Write service updates the target with the manipulated rows.

     

    Target

     

     

     

    The target is the database where the replication set is to be written.

     

    The process of moving the replication set from source to target according to defined functional and operational requirements is called a transmission.

     

    SQL Server Transactional Replication

     

     

     

    SQL Server replication uses several standard services to move data from a publication database to a subscription database. Figure 2 shows the services and processes involved in SQL Server transactional replication.

    Figure 2: SQL Server transactional replication

     

    Platform Roles

     

     

     

    SQL Server defines three roles for the platforms involved in the replication:

  • Publisher. The Publisher contains the source and the Log Reader Agent. In this environment, the Subscriber defines the replication set (publication) to be replicated.
  • Distributor. The Distributor holds the distribution database. It runs the cleanup jobs for managing the distribution database, and in a push replication also runs the Distribution Agent.
  • Subscriber. The Subscriber contains the target and creates a subscription to subscribe a publication. In a pull replication, the Subscriber runs the Distribution Agent.

     

    The Distributor does not necessarily need to be a separate platform. Its role can also be assigned to the Publisher, where it is called a local Distributor. Otherwise, it is called a remote Distributor.

     

    Software Components

     

     

     

    SQL Server transactional replication consists of two main processes and a database:

  • Log Reader Agent. The Log Reader Agent extracts changes defined by the publication from the transaction log of the publication database, and stores the changes to be replicated in an additional database, called the distribution database.
  • Distribution database. The distribution database is a SQL Server database and is located on the Publisher (called a local Distributor), or on a dedicated platform (called a remote Distributor). Because it stores changes in a proprietary format, the distribution database can only be used for SQL Server replication, not for any applications. After the Log Reader Agent extracts the changes, the logging system of the publication database is allowed to shrink the transaction log, because the information for the transmissions is saved in the distribution database.
  • Distribution Agent. The Distribution Agent applies the initial snapshot to the subscription database by moving transactions held in the distribution database to Subscribers. The Distribution Agent runs at either the distributor for push subscriptions, or at the Subscriber for pull subscriptions. Before actually writing the changes, the Distribution Agent can manipulate them using a Data Transmission Services (DTS) package that the Subscriber defines. If you need to perform manipulations before writing the changes to the subscription database, you can do so using such a DTS package.

     

    Independent of the replication processes, cleanup jobs run on the distribution database and perform the following tasks:

  • Agent History Clean Up: Distribution. Removes replication agent history from the distribution database.
  • Distribution Clean Up: Distribution. Removes replicated transactions from the distribution database.
  • Expired Subscription Clean Up. Detects and removes expired subscriptions from publication databases.
  • Reinitialize Subscriptions Having Data Validation Failures. Reinitializes all subscriptions that have data validation failures.
  • Replication Agents Checkup. Detects replication agents that are not actively logging history.

     

    A SQL Server replication is based on the following rules:

  • Any table that should be published needs a primary key as a unique identifier for a row.
  • SQL Server transactional replication assumes that the subscription database is read-only (unless updating subscriptions are used). This means:
  • Any changes to the subscription database at the Subscriber will be overwritten by the next transmission.
  • Any deletion of rows on the subscription database will cause a failure in the replication process.
  •  

    Mapping the Replication Building Block to SQL Server

     

     

     

    SQL Server transactional replication implements the replication building block elements used by Master-Slave Transactional Incremental Replication as shown in Figure 3. Figure 3 is the result of merging Figures 1 and 2.

    Figure 3: SQL Server implementation of Master-Slave Transactional Incremental Replication

     

    Source

     

     

     

    Incremental replication transmits only the changes that have been made to source rows that the target has copies of, rather than getting full refreshes of the data. As described in the design pattern Master-Slave Transactional Incremental Replication, this type of replication needs a recording of the source transactions to use for change transmission. SQL Server replication preserves the changes to the publication database in the distribution database for transmission to the subscription database. The distribution database is updated by the Log Reader Agent, which reads the transactional information from the transaction log of the relational database management system (RDBMS) and writes it to the distribution database. This allows the RDBMS to shrink the transaction log.

     

    The changes remain available in the distribution database for a specified period of time (the retention period).

     

    Replication Set

     

     

     

    In SQL Server, the replication set to be replicated from the publication database to the subscription database is called a publication. A publication consists of one or more tables, or only parts of tables. You can define the parts of tables in one of two ways:

  • Filter the rows to be replicated by defining a search condition.

    In the theory of relational databases, the result is called a restriction.

  • Filter the columns to be replicated as a subset of the table's columns.

    In the theory of relational databases, the result is called a projection.

     

    Acquire

     

     

     

    The Distribution Agent reads all the changed data from the distribution database from the last transmission up to the most recent transaction.

     

    Manipulate

     

     

     

    The Distribution Agent can manipulate the data. When the Subscriber registers its subscription to the Distributor, it can define a DTS package, which the Distribution Agent invokes before sending the data to the Subscriber. Each Subscriber can use its own package, so different Subscribers can get various views of the same data.

     

    Within such a DTS package, you can define any manipulations that can be performed using the data of the current row and the ActiveX script language. Each row from the publication database can result in only one row at the subscription database, or it can be skipped. Columns can be split or combined. Additionally, any kind of data type conversions and changes of column names can be done.

     

    Write

     

     

     

    The Distribution Agent writes the data to the subscription database. The Write is done by default using a stored procedure. Optionally, it can be done directly using SQL.

     

    In general, the Distribution Agent writes the changes regardless of whether the contents of the subscription database have been updated by any application. If you must prevent updates from being overwritten, you have to use updateable subscribers or configure a merge replication instead of a transactional replication. A merge replication is described in the pattern Implementing Master-Master Row-Level Synchronization Using SQL Server.

     

    Target

     

     

     

    The target is the subscription database, which contains the replication set to be updated.

     

    Implementation Strategy

     

    To set up a new replication link, follow these steps:

     

    1.Configure the Distributor:

     

    a.Create the distribution database: This is an internal SQL Server database used from the Distributor. In most cases, you use the default options to create this database, but you can also customize it.

     

    b.Choose the snapshot folder: You can either choose the default folder for exchanging the initial snapshot files or use a custom folder.

     

    c.Set up the Subscriber parameter: To define the default parameter for all subscribers, you can confirm the offered configuration options or customize them.

     

    2.Define the publication:

     

    a.Select the publication database that includes the publication.

     

    b.Decide if you want to transform the data during the transmission. If so, any subscriber must have a DTS package for the transformation.

     

    c.Select the possible database systems the Subscribers will use: SQL Server 2000, earlier versions of SQL Server, or another RDBMS.

     

    d.For each table, specify whether the entire table should be replicated, or restrict the table by specifying horizontal and vertical filters.

     

    e.Decide whether anonymous Subscribers are allowed to subscribe to the publication.

     

    f.Decide how often the Snapshot Agent creates a new snapshot.

     

    3.Create a DTS package for the transformation, if you have decided to use one.

     

    4.Define the subscription:

     

    a.Decide if you want a pull or a push subscription.

     

    b.Select the publication you want to subscribe to.

     

    c.Choose the subscription database.

     

    d.Decide if you want to initialize the subscriber using an initial snapshot from the publisher.

     

    e.If you want to start the transmission automatically, define a schedule.

     

    f.Specify the DTS package you want to use for this subscriber.

     

    g.Verify that all required services are started.

     

    At this point, all parts of the replication link have been configured. From now on, both Publisher and Subscriber will log all changes to the specified replication set using triggers.

     

    The SQL Server transactional replication runs different jobs:

  • Snapshot Agent: Creates the initial snapshot on the Publisher.
  • Distribution Agent: Applies the snapshots on the Subscriber.
  • Cleanup jobs: Clean up the distribution database on the Distributor.

     

    The example that follows describes in detail how to use the SQL Server wizards to set up such a replication link.

     

    Example

     

    This example shows how to configure transactional replication based on a particular replication set. The replication set used here is the authors table from the pubs sample database, which is delivered with SQL Server.

     

    Overview

     

     

     

    The environment has two SQL Server computers: PUB_SERVER and SUBS_SERVER. The Publisher has a database, PUB_DB, that contains the authors table from the pubs sample database. To use PUB_SERVER as a local Distributor, you create a distribution database named Distribution. The Subscriber has an empty database, SUBS_DB. Figure 4 shows the environment of the example.

    Click here to see larger image

    Figure 4: Example environment

     

    This configuration creates a periodical incremental transactional replication process with an initial snapshot. The Subscriber initiates the replication (pull) every 15 minutes. The data from the Publisher is manipulated during the replication by converting the first and the last name from the authors table into uppercase. (This example does not allow updateable or anonymous subscribers.)

     

    Configuring the Publisher and the Distributor

     

     

     

    To set up the example environment, follow these steps in the Configure Publishing and Distribution Wizard:

     

    1.In SQL Server Enterprise Manager, select the publication database server, right-click Replication, and then click Configure Publishing, Subscribers and Distribution.

     

    2.On the Select Distributor page, select the Make Pub_Server its own Distributor option, where Pub_Server is the name of the server you want to configure as a distributor. Click Next.

     

    Note: The Distributor can be located on the same server as the Publisher, in which case it is called a local Distributor, or on a remote server. This example uses a local Distributor.

     

    3.On the Specify Snapshot Folder page, type the name of a shared folder on the Publisher where the Snapshot Agent can store snapshot files.

     

    Note: You must create and share this folder manually, or use a previously defined system shared folder for this purpose.

     

    4.On the Customize the Configuration page, select No, use the following default settings to create a default distribution database.

     

    Note: In this simple example, it is sufficient to use the default settings. In a more complex environment, you should manually configure the distribution database.

     

    Creating a Publication

     

     

     

    Next, use the Create Publication Wizard to create a publication on the Publisher.

     

    1.In SQL Server Enterprise Manager, select the publication database server, expand Replication, right-click Publications, and then click New Publication.

     

    2.On the Welcome Screen, check the Show advanced options check box to use this wizard with the advanced options needed to enable a transformation of the data.

     

    3.On the Choose Publication Database page, select Pub_DBas the publication database, where Pub_DB is the name of your database for the replication.

     

    4.On the Select Publication Type page, select Transactional Publication to create a publication for a transactional replication.

     

    5.You do not want to allow the replication of changes from the subscriber to the publisher; therefore, ensure that no option is selected on the Updateable Subscription page.

     

    6.On the Transform Published Data page, select Yes, transform the data to enable the transformation during the replication.

     

    7.On the Specify Subscriber Types page, select only Servers running SQL Server 2000.

     

    8.The Specify Articles page shows possible objects for replication in the publication database. From the Object Type Tables list, select the table authors.

     

    Note: Clicking the ellipses ( ... ) button for each object shows additional properties for the replication. You do not need them in this example.

     

    9.On the Select Publication Name and Description page, specify the Publication name as example. The default publication description can be used.

     

    10.On the Customize the Properties of the Publication page, select the option beginning with No, create the publication as specified, because you can use the defaults options shown in the text box below the options.

     

    11.On the Completing the Create Publication Wizard page, click Finish to create the publication.

     

    Defining a DTS Package for the Transformation

     

     

     

    Before you can create a subscription for the defined publication, you must create a DTS package for the transformation.

     

    1.In SQL Server Enterprise Manager, select the publication database server, expand Replication, open Publications, right-click on the example:pub_db replication, and then click Properties.

     

    2.On the Publication Properties page, click theSubscriptions tab and then click Transformations.

     

    3.On the Welcome Screen, click Next.

     

    4.On the Choose a Destination page, select the subscription database server and database. (See Figure 5.)

    Figure 5: Choose a Destination page

     

    5.On the Define Transformations page, click the ellipses ( ... ) button from the article authors.

     

    6.On the Column Mappings tab of the Column Mappings and Transformations page, select Drop the existing table and recreate it from the list box. Leave the mappings unchanged. (See Figure 6.)

    Figure 6: Column Mappings tab on the Column Mappings and Transformations page

     

    7.Click the Transformations tab. Click Transform data using the following script and select VB Script Language from the list box. (See Figure 7.)

    Figure 7: Transformations tab on the Column Mappings and Transformations page

     

    8.Use the following code to transform the first and the last name of the author to uppercase:

     
    
    Function Main()
       DTSDestination("au_id") = DTSSource("au_id")
       DTSDestination("au_lname") = UCase(DTSSource("au_lname"))
       DTSDestination("au_fname") = UCase(DTSSource("au_fname"))
       DTSDestination("phone") = DTSSource("phone")
       DTSDestination("address") = DTSSource("address")
       DTSDestination("city") = DTSSource("city")
       DTSDestination("state") = DTSSource("state")
       DTSDestination("zip") = DTSSource("zip")
       DTSDestination("contract") = DTSSource("contract")
       Main = DTSTransformStat_OK
    End Function
     

     

    9.On the DTS Package Location page, select the option to save the package on the Distributor. Use the SQL Server authentication option, and insert the user and the password to connect to the Distributor.

     

    10.On the DTS Package Identification page, specify the name of the package. In this example, the name should be transform_sub. You do not need to define an owner password.

     

    11.On the Completing the Transform Published Data Wizard page, click Finish to save the package as specified.

     

    Creating a Subscription

     

     

     

    To complete the configuration you must create a pull subscription for the defined publication on the Subscriber.

     

    1.In SQL Server Enterprise Manager, select the subscription database server, expand Replication, right-click Subscriptions, and then click New Pull Subscription.

     

    2.On the Welcome Screen, click Next.

     

    3.On the Look for Publications page, select Look at publications from registered servers.

     

    4.On the Choose Publication page, expand PUB_SERVER and select the publication named example:pub_db, where PUB_SERVER is the name of your Publisher and example:pub_db is the name of your publication. (See Figure 8.)

    Figure 8: Choose Publication page

     

    5.On the Specify Synchronization Agent Login page, select Use SQL Server Authentication. Enter the login and the password of a user account that is used to connect to the Publisher during the replication.

     

    6.On the Choose Destination Database page, select Subs_DB as the database for the subscription, where Subs_DB is the name of your subscription database.

     

    7.On the Initialize Subscription page, select Yes, initialize the schema and the data.

     

    8.On the Snapshot Delivery page, specify the snapshot folder. Use the default folder of the publication.

     

    9.On the Set Distribution AgentSchedule page, select Using the following schedule and click Change to specify a new schedule for the Distribution Agent.

     

    10.On the Edit Recurring Job Schedule page, define the schedule for the Distribution Agent as a daily run with an interval of 15 minutes. (See Figure 9.)

    Figure 9: Edit Recurring Job Schedule page

     

    11.On the Specify DTS Package page, select Distributor and click List packages. For the transformation, select the transform_sub package that you created previously. (See Figure 10.)

    Figure 10: Specify DTS Package page

     

    12.The Start Required Services page shows the status of the SQL Server Agent on the Subscriber. If the SQL Server Agent is not running, select the check box next to the entry for the agent. The agent will be started after the wizard creates the subscription.

     

    13.On the Completing the Pull Subscription Wizard page, review the options for the specified subscription. Click Finish to create the subscription with these options.

     

    The configuration process is finished. Before starting the first transmission, the following checks should be done:

  • Did the Snapshot Agent create the first initial snapshot? It may be necessary to start the Snapshot Agent manually on the publication database.
  • Did the Pull Agent initialize the schema and the first replication? If this has not been done automatically, you can start the process manually on the subscription database.

     

    Starting and Restarting the Synchronization

     

     

     

    1.In the Enterprise Manager, select the publication database server, expand Replication Monitor, select Publishers, Pub_Server, and open the publication example:pub_db.

     

    2.Right-click the SnapshotAgent and select Start Agent to create an initial snapshot.

     

    3.Right-click SUBS_Server:subs_db and select Start Synchronizing to start the transactional replication.

     

    Testing the Example

     

     

     

    You can easily test the functionality of the implemented replication by changing some data in the publication, starting the replication process, and checking to see if the Subscriber has changed accordingly.

     

    To check various kinds of data changes, perform INSERT, UPDATE, and DELETE operations on the data in the publication database.

     

    1.Change data in the publication database. For example:

     
    
    UPDATE authors SET au_lname = 'Smith' WHERE au_id = '807-91-6654'
    
    INSERT INTO authors (au_id, au_lname, au_fname, phone, contract) 
           VALUES ('453-12-3255', 'Smith', 'John', '400 486-234', 1)
    
    DELETE FROM authors WHERE au_id = '672-71-3249'
     

     

    2.Manually start the replication process.

     

    3.Check to see if the rows changed in the subscription database. For example:

    SELECT * FROM authors 
    
      
    endCode}

     

    Resulting Context

     

    Although this pattern describes a replication between SQL Server databases, the subscription database can be any heterogeneous database that provides an ODBC or OLEDB interface, such as Oracle, DB2, or Microsoft Access.

     

    Note: To use a heterogeneous database as a publication database, you need a software component that implements a dedicated interface - the replication distributor interface. The examples that are delivered with SQL Server show how to implement this software.

     

    There are two ways to use a heterogeneous database as a subscription databse:

  • Use ODBC or OLEDB and create a push subscription using the wizard on the Publisher. This is the simplest way to publish data to a heterogeneous Subscriber.
  • Create a publication and then create an application with an embedded distribution control. The embedded control implements the pull subscription from the Subscriber to the Publisher.

     

    For Subscribers, the subscribing database has no administrative capabilities regarding the replication being performed.

     

    A closer look at the architecture of the SQL Server elements reveals that the implementation corresponds to a cascading synchronization with two replication links. One link is from the publication database to the distribution database, and the other link is from the distribution database to the subscription database. The distribution database plays the role of a cascade intermediary target/source (CITS). It can serve additional replications links to other targets. The Master-Slave Cascading Replication pattern descibes this design and the terms associated with it..

     

    This pattern inherits the benefits and liabilities from the pattern Master-Slave Transactional Incremental Replication. Because this pattern actually implements a cascading replication, it also inherits the benefits and liabilities of the Master-Slave Cascading Replication pattern. The use of this pattern also results in the following additional benefits and liabilities:

     

    Benefits

     

     

  • Integrated logging mechanism. If you track the transaction log, you do not need a logging mechanism for changes in the application or database.
  • Integrated into SQL Server Enterpise Manager. The configuration and execution of the replication in SQL Server is integrated in the SQL Server administration, and can be easily done with the SQL Server Enterprise Manager.

     

    Liabilities

     

     

  • Additional database to be managed. The implementation requires the distribution database as an additional database that must be managed.

     

    Testing Considerations

     

    After you set up the replication link as described in this pattern, you must test it thoroughly. Your test cases should cover these scenarios and others:

  • Correct transmission of transactions that update the replication set
  • The largest transaction that might occur
  • The longest disconnection between Publisher and Subscriber that you expect under production conditions
  • Network interruption between the Publisher and the distribution database during transactions on the Publisher, if you are using a remote distributor. (This does not apply to the example in this pattern.)
  • Network interruption between the distribution database and the Subscriber during a a running transmission
  • A hard abort of the database system on the Subscriber during a running transmission

     

    After each test, make sure that the Subscription is in the correct state.

     

    Also make sure that transactions that do not update the replication set are not transmitted.

     

    Finally, put the highest expected load on the Publisher and verify that the replication still runs correctly.

     

    Security Considerations

     

    To secure the connection between the Distributor and the Subscribers, do one of the following:

  • Use a SQL Server user account to connect to the Distributor.
  • Place all computers in one Microsoft Windows domain, or have a common user with the same password on all computers. Then use a trusted Windows connection between Distributor and Subscribers.

     

    If you prefer a security approach that is independent of the operating system, you should use SQL Server authentication.

     

    Note: The SQL Server Agents on all systems must run on a local or a domain account, not on the local system account. Otherwise, the replication does not work.

     

    Operational Considerations

     

    The SQL Server Agent manages the different jobs of the replication. Schedules for these jobs are defined during the configuration. Alternatively, you can start each job using the SQL Server Enterprise Manager.

    The Replication Monitor in SQL Server Enterprise Manager provides the following information:

  • A list of all publications and attached agents with the time and the duration of the last execution and information about the last action
  • A list of all agents grouped by task with current status and a history of actions
  • The event log entries created by Replication Alerts, which watch the replication process.

     

    If a row was deleted on the subscription database and the transmission tries to change this row, an error occurs. In this case, there are two solutions:

  • Manually restore the row in the subscription database by using an INSERT statement based on the data from the same row in the publication database.
  • Have the replication automatically use a new snapshot to rebuild the subscription database. This snaphot must be newer than the first erroneous transmission.

     

    If the replication fails, do the following:

  • Check to see if the accounts have network access rights.
  • Check the history of the subscription to determine the reason the synchronization failed.
  • Check the Replication Monitor for highlighted replication failures.

     

    The replication usually runs on a defined schedule. If you want to test the replication or start the merge replication immediately, however, you need to start the replication manually.

     

    Hints: It is common practice to have the distribution database on the same server as the Publisher, for example to ensure frequent and rapid movement of the transactional information from the transaction log to the distribution database.

     

    If there are many subscribers and a high rate of operations on the publisher, the Distributor can become a bottleneck. In this case, you should configure the Distributor on an additional server to separate the transmissions from the activity caused by the applications.

     

    Transferring a large amount of data over slow conections can be problematic. An alternate solution is to create the initial snapshot files and transaction data using removeable media, replicate the database, and then establish the transactional replication over the slow network.

     

    Related Patterns

     

    For more information, see the following related patterns:

     

    Patterns that May Have Led You Here

     

     

  • Move Copy of Data. This pattern is the root pattern of this cluster; it presents the overall architecture for maintaining copies of data.
  • Data Replication. This pattern presents the architecture of data replication, which is a particular way of moving copies of data.
  • Master-Slave Replication. This pattern describes design considerations for transmitting data from the source to the target by overwriting potential changes in the target on a higher level than the Master-Slave Snapshot Replication.
  • Master-Slave Transactional Incremental Replication. This pattern presents a solution that transmits only the changes from the source to the target on a transaction-by-transaction basis.

     

    Other Patterns of Interest

     

     

  • Master-Slave Snapshot Replication. This pattern presents the design of snapshot replications, which this pattern uses to prepare the target.

    Patterns Practices

  • Top of Page Top of Page