Getting paid for what you deliver is the basis for long-term profitable business. Yet software vendors continue to lose about 41 percent of their income through software piracy. According to BSA (Business Software Alliance), the worldwide piracy rate went up from 38 percent in 2007 to 41 percent in 2008, the global rate rose for the second year in a row. The monetary value of unlicensed software (losses to software vendors) grew by more than $5.1 billion (11 percent) to $53.0 billion from 2007 to 2008.
Software vendors looking to increase their revenue stream should look for ways to stop piracy of their software.
Application source code constitutes the company’s software intellectual property (IP) and is vital for maintaining its competitive advantage and revenue stream.
In this article I'm going to review obfuscation as a mean to provide IP protection, however first I'd like to make a distinction between 2 different terms that are often being mixed up and used in conjunction with the term obfuscation unrightfully.
Software Piracy costs more than money alone
The first is IP protection which refers to the means used by a company to protect its intangible assets, created by its employees, often embedded in software code and it's what forms the company's competitive advantage.
The second is copy protection which is the technology used by a company to prevent the reproduction of its software.
Occasionally obfuscation is being argued as a mean for copy protection, aiming to prevent distribution of unauthorized copies of software, however in reality obfuscation isn't aimed towards that goal, in fact copy protection can be easily circumvented even if your code is obfuscated.
Obfuscation literally means making something less clear and harder to understand. Developers are inspired to write code as clear and easy to understand as possible. The rational is to make it easier for you to maintain your code and let others quickly understand it and refactor when needed.
Obfuscation literally means making something less clear and harder to understand
Obfuscation however is quite the opposite. It takes your code and makes it more difficult to read if you open it using a disassembler, while maintaining the application code flow same as the original.
In this article we will review some of the obfuscation techniques used by software companies today to protect their code. Before we cover each individual method we'll set criteria to grade against, this will help us to measure the pros and cons of each different method. Depending on your specific situation you may prefer one method over the other taking into account the following:
- Readability: ability for human to read & understand the code.
- Reversibility: ability of a tool to undo transformation.
- Performance Impact: impact on code execution.
- Maintenance/Support: impact on ability to support transformed assemblies.
Entity Renaming
The first method we're going to discuss is entity renaming, as its name suggests this method renames metadata entities stored in an assembly. This includes class names, method names and parameters, fields, events & properties.

In terms of readability the code looks harder to understand although as seen in the sample above entity renaming is limited to the code that is under the developers control and doesn't include any external calls to 3rd party libraries or calls to the standard .NET libraries, the reason for that is that .NET uses methods names in order to resolve them at runtime.
Entity Renaming renaming parameters, fields, events & properties
Due to the one way transformation nature of this method it's impossible to infer the original names that were used and therefore reversibility scores high.
Performance impact is negligible as essentially the complexity of the code remains exactly the same in terms of the instructions that are executed by the jitter.
The down side of using entity renaming is the burden it adds on maintenance. It makes it difficult to debug your code in production environment. Exceptions generated and reported by a user will typically include obfuscated method and class names making it almost impossible to trace back the exact locations in the source code.
Entity renaming often breaks your application code due to the usage of reflection API, some .NET practices such as XML serialization, LINQ, web services, etc. rely on reflection API and therefore you should be careful using this obfuscation method in these scenarios.
Unfortunately this behavior calls for adding additional test cycles into the development life cycle. The goal would be to verify that obfuscated code doesn’t break you application code flow, therefore you will need to exercise all the logic in your application that may break as a result of applying entity renaming on it.
Control Flow Obfuscation
Now let’s look at a second obfuscation method called control flow obfuscation. Control Flow Obfuscation hides the control flow information of the program by transforming existing code flow patterns to semantically equivalent constructs, however different than the code originally written. The control flow obfuscation algorithm converts the original implementation into spaghetti code thus making it extremely harder to infer program logic.
There are different levels of control flow obfuscation applied by different obfuscators. A naive example of an attempt to control flow obfuscate is demonstrated below:

Looking on the left side you can see the original method; on the right is the result after applying flow obfuscation, pay attention to the first few instructions. The obfuscator simply adds an invalid sequence of instructions at the beginning of the method, the rest of the method is identical to the original. The invalid sequence added attempts to pop a value from the stack, however at that point the stack is empty and therefore the code will break. In fact the code is never executed due to the branch statement placed at the beginning of the method that simply skips the invalid sequence; however a disassembler will often break when trying interpreting that code sequence.
More advanced control flow obfuscation is shown below:

The simple ‘for loop’ on the left completely gets mangled and transformed to a switch statement shown on the right. These are 2 different types of logic; however they behave the same at runtime. Seems like the obfuscator has done a great job destroying the original code pattern, the reflector hasn’t been able to reverse the MSIL code into that for loop, instead it reads the code as a switch statement.
Control Flow Obfuscation to hide the program logic
Judging control flow obfuscation by the criteria we set earlier it obviously makes the code harder to read, not in the sense that the names of the identifiers have changed but in the sense that it makes the logic harder to follow and comprehend.
Depending on the quality of the solution reversibility can be achieved, the first example shown can be reversed easily by omitting the invalid code sequence from each method, the more advanced example can’t be reversed to the original form as a one way transformation was applied on the original code pattern, there is simply not enough information for reflector to detect that a for loop was originally used in that particular code sequence.
Performance may vary, depending on the type of obfuscation algorithm used; you’ll have to test it for yourself to determine how it affects your code.
As opposed to entity renaming very little impact is expected on maintenance and support. Stack trace information remains intact; the ability to debug the code isn’t hindered. The code isn’t broken due to usage of reflection API or other methods mentioned in conjunction with entity renaming therefore no additional test cycles are needed.
String Encryption
The last obfuscation method presented in this article is string encryption. String encryption transforms the strings located in source code so that in the compiled binaries they will appear as encrypted strings. Using reflector or any other static analysis tool won’t reveal the original strings. Additional code is injected into the assembly to support decryption of the strings during runtime. An example demonstrating this method is shown below:

The trial message shown on top is converted to long Unicode encoded string that is a complete gibberish. If you have sensitive information stored as string such as connection strings, license codes or trial expiration date you would apply string encryption to hide this information, preventing it from being exposed by static analysis tools.
String Encryption to hide literals
Let’s look at the characteristics for string encryption: human can’t read the strings in their encrypted form, reversibility is a matter of finding the decryption method provided that it’s written as managed code and writing a decryption utility to break the encryption algorithm used. There is a slight performance impact since the strings have to be decrypted at runtime. String encryption demonstrate very little impact on maintenance and support as the method effects only the MSIL code rather than the metadata entities themselves.
Conclusion
We’ve just reviewed 3 common obfuscation methods that are aimed towards protecting one’s IP.
It’s evident that one should take special consideration when selecting the right methods to protect his code, as some have greater impact on the development process than others. Entity renaming, for example, should be integrated early in the development cycle as it may have major impact on the application code flow , integrating it at a later stage may cause unnecessary delays.
Control flow obfuscation and string encryption come in many shape and sizes, therefore it’s advised that you test different options and compare, finally choose what is right for your project.
More on CliSecure .NET Obfuscator