Thursday 20 December 2012

Data Encryption and Protection - Part 2

In my last post I ran through the basics of SQL Server's database encryption options: Transparent Database Encryption (TDE) and Cell-Level Encryption. In this post I'm going to discuss some of the considerations required before implementing encryption. 

There are four main points to consider:

  1. What do I need to accomplish
  2. Cost of implementation
  3. Cost of rollback/removal
  4. Performance impact
My last post covered the facts that will inform the first question. And I feel that, in an ideal world, a combination of both encryption types is best. This post will deal with questions 2 and 3 in regards to both Transparent Data Encryption (TDE) and cell-level encryption as well as performance, question 4.


TDE 

As far as the second question is concerned, TDE seems like a no-brainer choice. It can be easily implemented on a database as it is "transparent" to any application or user accessing the database. As a result it is a low cost way of getting to the goal of protecting your database. So the cost of implementation is low.

There is however a known complication with rolling back TDE - essentially decrypting the database. And here is where, in regards to question 3, we need to evaluate the cost of rollback. The database log does not get fully decrypted. The reason for this is that the transaction log is meant to be written to only once. In order to ensure a fault free rollback it is recommended to rebuild the database log from scratch. This involves changing the recovery model from full (if used) to simple and the restarting SQL Server instance. Restarting SQL Server requires downtime, usually a very small window of downtime, but downtime nonetheless. For more detail on the issues with removing TDE and the steps above read Steve Perry's (not the singer from Journey, at least I'm pretty sure he isn't) very good article outlining the steps needed to ensure proper TDE rollback on sqlservercentral.com.


But the real complication comes in when failover strategies are employed, such as mirroring. When two databases are in a mirroring relationship the principal cannot be switched to simple recovery model. I still haven’t fully tested this scenario butmy understanding on my understanding and conversations I've had, mirroring needs to be removed – watch this space as I'll post my findings after testing. What most likely needs to be done is:

  1. Take databases out of mirroring and keep traffic pointed at the primary
  2. Backup the database and the log of the primary and then restore to the secondary.
  3. Before putting the secondary back into mirroring follow the steps to remove TDE but do not remove the Key of the certificate.
  4. Put the databases back into mirroring and then manually failover
  5. Repeat steps 1 - 3 on the new secondary (This was the primary until you failed over) 
  6. Put the databases back into mirroring and then failback if needed
To add another layer of complexity take, as an example, some of the systems I maintain. They employ complex merge replication topologies which use mirroring as a publisher high availability strategy. Although this arrangement is meant to failover seamlessly there are always risks. So, as with everything, testing helps in learning how to deal with and plan for resolving those risks, but is not fool proof. In a worst case scenario, it is possible that replication would need to be rebuilt.

Cell-Level Encryption

This type of encryption is, in a way, easier to consider and plan for. At the outset, if you have an existing application and database the cost of implementing cell-level encryption is high. In addition to setting up the various encryption keys you'll need to add the following functions to every select, update and insert statement that your application(s) use:
And, for existing data, new columns need to be created in each table where cells require encryption. So the applications will need to be modified to handle the new columns (unless they are named the same - but that depends on coding standards). This may also disrupt replication and reports or other ETL type procedures may need to be modified to use both the new columns and the encryption functions, if needed. 

Rolling back cell-level encryption I would see as similarly resource intensive. All the steps mentioned above would have to be reversed. 


In both cases, implementing and rolling back, cell-level encryption would also require a large amount of testing. Especially if the data to be encrypted is regularly accessed data, e.g.. commonly accessed personal details like names, dates of birth, phone numbers, addresses, etc.


Performance
This last consideration is a very important one. And as with all performance related issues, there are compromises to be made. this is not so much the case with TDE. It has been designed as a low impact feature of the database engine itself. So in terms of implementing, performance is not really a big consideration.

Not so cell-level encryption. It is very important that every column that is encrypted be carefully thought through. There are a couple of potentially serious performance issues if they are not well planned out:


  • The encrypt/decrypt process adds additional overhead to the system.
According to Microsoft’s “Database Encryption in SQL Server 2008 Enterprise Edition” whitepaper performance degradation is estimated at 20%-28%. The actual amount of degradation obviously depends on the size and configuration of the system/database as well as the amount of data that needs to be encrypted. It is also affected by the type of encryption (symmetric vs. asymmetric keys) and the strength of the algorithm used. Symmetric keys are more efficient than asymmetric. And the frequency of encryption will greatly affect performance. If you have only one column in the database that is encrypted and it is accessed very rarely by the application performance won’t be degraded as much as encrypting entire tables that are read from and written to on a regular basis.

  • Mainly due to the encryption process’ introduction of randomness indexing on encrypted data columns is not possible.
In practice this means that any columns used in WHERE or JOIN clauses should be ruled out for encryption. This is probably not always possible. But the risk to performance needs to be understood.
  • Database size may also be affected
Although this isn’t as much of an issue as it used to be, it is still worthy of consideration. As all encrypted columns need to be of the varbinary data type, they will most likely be larger than the column they replace.



For these reasons, I hope it is clear why planning for cell-level encryption is so important. A heavily taxed system may be tipped over the edge. While one that is efficient may be significantly degraded due to the necessary changes to the indexes. The performance considerations may, in the end, point to a higher cost of implementation/rollback as existing queries and indexes will require more modification than just adding the encrypt and decrypt functions. In order to maintain performance queries may need to be radically modified to change the way they access the data and join tables together.

Please feel free to leave any comments regarding your experience with any of the issues covered here. I'm always happy to find out if I've missed something or learn new things.

My next post on encryption will deal with the data being sent to and from the database server. 


No comments: