Tags

wow (82) real.life (27) mathematics (19) info.tech (13) commerce (10) doomsday (7) runescape (4)

Search This Blog

25 September 2010

Full Outer Join

800px-Beech_tree_trunk_inosculation During mid-September, WowStead changed its hosting partner to Curse Inc.. In doing so, it needed to migrate vast amounts of data over to the new Web servers and, among other things, merge the databases of both websites together. This was necessary given that users will log in to WowStead using their Curse Account. Unfortunately, several problems arose from the process:
  • WowStead users that do not have a Curse Account cannot log in to WowStead.
  • WowStead users that do not have a Curse Account have lost access to their characters and subsequently need to reclaim them.
  • In the vice versa of the above, users have access to their characters but do not own the account they were assigned.

While explanations of what went wrong are not readily forthcoming, a likely cause is the misuse of the methods adopted to merge the two databases together. Given the popularity of relational databases in storing website data, one can contemplate how the use of relational algebra can bring about such complications.

Relational databases
In a given table in a database, a record of an entity instance contains specific attribute values. Instead of storing all data in a single table, it can be divided among several tables, each storing information about the entities involved. For example, if a customer purchases an item in a transaction, the data collected from it can be stored in three tables: customer, item and transaction. By storing all customer data  in one table and all item data in another, the transaction can be recorded by simply entering data into a record in the transaction table linking item to customer.

In doing that, customer and item data are not duplicated for every sale that happens. One of the great strengths of relational databases is the reduced need for such duplication, significantly reducing the file size of the database and helping keep data consistent (which would not be the case in a single-table database if the data is not updated every time it occurs).

Relational algebra
Given the modular nature of a relational database, how does the user assemble the data back into a single meaningful table? They do so using relational algebra. Among other operations, JOIN can be used for creating a table by merging records according to the values of certain attributes. The JOIN operation takes many forms:
  • Natural join, where records are merged when all of the same attributes have the same values; records that are not matched are discarded.
  • Equijoin, where one attribute from each table is chosen and records are merged when the values are equal to each other; records that are not matched are discarded. If an operator other than the equality operator is used, it is called a theta join.
  • Outer join, where the records of one table inherit data from the records of the other table; records of the other table that are not matched are discarded, and records of the first table that do not inherit any data are given null values for the missing attribute values.

There are three types of outer joins, which differ on which table inherits the data from the other table:
  • Left outer join, where the first mentioned table receives the inheritance.
  • Right outer join, where the second mentioned table receives the inheritance.
  • Full outer join, where each table inherits from the other; under such a join, no record from either table is discarded.

The database merger
Among other tasks, the operators of WowStead and Curse needed to merge the WowStead and Curse user tables. An obvious method would be to do a full outer join matched according to email address:
  • Given that an email address is intimate to the user (it commonly used by website operators for sending official correspondence, after all), it is a very sensible attribute to join the tables according to. Also, Curse requires that every account use unique email addresses, so, in addition to the username, the email address can uniquely identify each account.
  • Joining according to username is another obvious choice, but the number of possible usernames is much less than the number of possible email addresses, giving a fair probability that different users will use the same username on different websites.

Given the following example tables:
WowStead_Users
W_Username
Email_Address
user1
user1@email1.com
user2
user2@email2.com
user3
user3@email2.com
Curse_Users
C_Username
Email_Address
user1
user1@email2.com
user2
user2@email2.com
user3
user3@email1.com

Performing a full outer join on Email_Address would yield the following table:
Users
W_Username
C_Username
 Email_Address
user1

user1@email1.com
user2
user2
user2@email2.com
user3

user3@email2.com

user1
user1@email2.com

user3
user3@email1.com

Implications
If a full outer join on email address was used for merging the WowStead and Curse user tables together, no problems will arise where the same user signs up to both websites using the same email address; that user enjoys access to both websites using the same login details.  However, where:
  • A user uses one website but not the other, if:
    • They never sign up on the other website, an unmatched record will arise, which the website operators will need to manually handle.
    • An unauthorised user steals the rightful user’s email address and uses it for signing up to the other website, the rightful user loses access to both websites.
  • The same user uses different email addresses for each website, their accounts will not merge.

EDIT 30Sep10: Appended observation that Curse assigns unique email addresses to each Curse Account.

No comments:

Post a Comment