Sunday, June 24, 2012

SQL injection prevention

This was a research project for software security course work which I am sharing here in my blog.

SQL injection vulnerability was reported as a top cyber security issue in 2010 and is considered to be accounting 10% of total cyber security vulnerabilities since 2002. It is a known fact that using parameterized queries can mitigate SQL injection vulnerabilities to  a large extent. This paper shows the underlying practical issues programmer’s encounter when using parameterized queries. It also shows how a plugin tool inside the developer’s integrated development environment (IDE) could help programmers in generating parameterized queries. Further the usefulness and practicality of this helping tool is discussed.


SQL injection vulnerability (SQIV) accounts for over 20% of input validation issues and 10% of the overall cyber security issues between 2002 and 2007. Further it was reported as the second top cyber security vulnerability of the year 2010.  SQLIV can lead to theft of confidential data, loss of money, loss of critical data and affect reputation of the organization.

Structured Query Language (SQL) is widely used as a standard method to interact with relational database from a webpage.  An SQL statement can call the database managing program, typically a database server to return a particular result set from the database table or to update the table. Typically SQL statements are not compiled until the request is made by the web page under execution. This is usually because the query would likely depend on user request and may contain a user entered input. An SQL injection happens when a malicious user persuades the user input in such a way that the developer defined SQL query is modified undesirably. This is done by inserting code in to the data part of the SQL query maliciously.

Several SQLIV mitigating mechanisms exist, of which the most effective one is the usage of parameterized query, which is also known as prepared statements. Parameterized queries force the developer to define the code portion of the SQL query first and then clearly define the data portion of the query. In this way SQL compiler can safely distinguish between the code and data portion of a query. It is interesting to note that parameterized queries were introduced a long period back; however by contrasting with the frequent occurrences of SQL injection attacks we can conclude that they were not used by web developers, at least in some cases where attacks happened.

In this paper we study the difficulties encountered in using parameterized queries. We developed a plugin tool namely SQLSecuritytool for Microsoft’s visual studio 2010 web development IDE that could aid web developers in ensuring safety of SQL queries they use in server side scripting. The findings, the practicality and usefulness of this security tool are described in this paper. Section II discusses a brief background to understand this paper. Section III goes through the related work in mitigating SQLIV. Section IV analyse problems faced by web developers in using parameterized queries. Section V describes the tool we developed for replacing general SQL queries to parameterized queries. Finally the security benefits and conclusion are discussed subsequently.


This section gives some background about SQL Structure, SQL injection attacks, and parameterized queries.

SQL statement

An SQL statement defines the operations performed or requested to the database and is usually defined by a database analyst in consultation with the web developer. An example SQL query used in login pages is described below:

Select password FROM usernames_table WHERE username = ‘name’

The code portion of the above statement is shown below.

Select password FROM usernames_table WHERE username =

The data portion of the SQL query is ‘name’. Typically the data portion is known only at runtime. Therefore web developers substitute that field with a variable name. For example in the above query would be defined by the web developer as shown in figure 1.

Figure 1.   Typical SQL query in a webpage server side script

Password hashes obtained from the database is used to validate the username in this case.  Typically the database is connected by using the driver provided. In this case oracle client driver for is used which require importing a dynamic link library (dll) provided by Oracle company to the project.

SQL Injection Attack

An SQL injection attacks happen when the developer fails to properly define his program in such a way so that there is always a desired separation between code and data portion for any user inputs at run time. Or by using standard definitions, an SQLIV is caused by dynamic SQL statement code combined with inadequately-verified input, which allows the input to change the structure and logic of an SQL statement.  For example an user input “111’ OR true#“ would create the following query which would retrieve all passwords.

SELECT password FROM usernames_table WHERE username = ‘name’ OR true#’

If the web application is expected to display the results in a table then all the password hashes will be displayed to the attacker. In addition to this example, several other SQL injection methods exist. Fortunately all of them work by mixing the user input with actual SQL code.

Parameterized query

Parameterized queries are SQL statements that take a static structure unaltered at runtime. It defines the user input as parameters expected at the runtime. Therefore it clearly makes a distinction between the code and data portion of the SQL query. The query described in figure 1 can be changed to a parameterized query as figure 2 shows.

Figure 2.   Typical parameterized SQL query in a server side script

There are multiple ways to define parameterized queries and it depends on the language in which the web developer programs. When a large SQL query is used a number of parameters may be required to be defined carefully.

Web development issues

It is generally known among web developers that using parameterized queries can give maximum security against SQLIV. It is being suggested and encouraged as a standard pattern to be followed by web developers. However if the existing code is not written using parameterized queries, it would be a tedious effort for the developer to walk through and carefully scrutinize the code to rewrite them using parameterized queries. Existing frameworks in .Net do not provide any help for migration to parameterized queries. Further in many organizations the task of writing SQL statements are assigned to DB analyst. In small organizations a developer may take the role of the DB analyst. This reason in addition with the time-constraints may force those developers in writing there codes in the most easy way, by not using any security mitigating logic such as parameterized queries. Personally this reason and our knowledge in .Net framework gave us the motivation to write an automated tool to assist the developers.  

Replacement tool

The replacement tool I developed basically works as a plugin to the Microsoft Visual Studio IDE. Upon enabling this plugin and running, the developer is given options to find vulnerable SQL queries in the current project. Also it provides the parameterized form of the vulnerable query so the developer can replace the vulnerable SQL code. The plugin works by using the Microsoft DTE interface which gives an object oriented interface for developing plugin extensions to visual studio. The source code is analyzed using regular expressions. These two design components are discussed below.

DTE interface

Microsoft has exposed some of the inbuilt classes of the Visual Studio IDE to the developer through the DTE interface. The DTE object oriented model allows the programmer to access the project, code and other components of visual studio from a user defined plugin. Static analysis of the code is possible by using this interface. There are two ways to do static analysis. One is by analyzing the code components of the present project using the DTE model which uses the inbuilt lexer and parser. Another is to use regular expressions to find the desired pattern of code. Each has its own pros and cons. The former can do static analysis accurately however it needs in-depth knowledge of the DTE object model. The latter is less accurate since we assume to have a lot of code patterns in the program. However it is faster and easy to write. For the sake of this short academic project and also due to time-constraints we opted to develop the tool using regular expressions.

Regular Expressions

Regular expression provide a fast and easy way to make static analysis of the program code. It requires some basic understanding of creating a pattern which we want to match from code. Further many languages have inbuilt regular expression engine that allows to extract text programmatically. We used the regular expression library provided by Microsoft. For example the following expression could find the SQL statements occurring in server side page.


Basically it looks for text that matches the word SQL or objcommand followed by the SQL query and execution commands. This pattern was able to detect many of the vulnerable code in the project. The reason for this being a good pattern is that it exploits the common syntax used in programs to define SQL codes. However it would not definitely give a perfect match, because the program can take different structure depending on the developers coding style. However with carefully scrutinized patterns one could expect an average matching rate of around 70%.

Developer Interface

The developer user interface made use of the windows forms framework provided by Microsoft. It has various options to find the next occurrence of unsecure SQL query in the project. If the matching code file which contains such unsecure SQL query is not yet opened by the developer, it will be automatically opened. Further the present unsecure SQL query found will be selected and highlighted in the code window. Further manual editing of the generated query can be done by modifying the suggested parameterized SQL query. The sample user interface screen shot is shown in figure 3.

Figure 3. User Interface

Assumptions and limitations

This conversion tool assumes that all command lines in the program are removed by the programmer before running the application. This is because the regular expressions could mismatch the comments as source code. Further there are different ways to define the SQL code inside a program, but I assumed only the widely used syntax of defining SQL code in the source code. However these issues can be avoided using the DTE code model with the inbuilt lexer and parser or by defining more scrutinized regular expression patterns. However we believe it is out of scope for this short project.


Since parameterized queries can guarantee protection against SQLIV the developed replacer tool is expected effective against most of the SQLIV. We tested this tool against over ten different projects obtained from internet and also from organizations. It showed a conversion rate of over 80% for those projects which used the syntax similar to those defined in the regular expression pattern. Some of those queries were having over 40 parameters. However this was not true in all the case. There was obvious failure in detecting the vulnerabilities in those projects that do not employ the defined matching pattern. By defining more and more patterns one may be able to achieve a detection and conversion rate of over 80%. However regular expressions cannot be used for those programs that employee complex logic inside the SQL query definition. For example if a string builder was used to store the SQL query, with query being added part by part, the matching pattern would be too difficult to write. In such situations analysis using the lexer and parser of .Net framework would provide accurate results.

SQL structure
Static SQL
Dynamic SQL
Conditional SQL
Batch queries
Nested SQL

Table 1. Effectiveness of the developed conversion tool.

Nevertheless this project showed the power of regular expressions in doing static analysis to prevent SQLIV. If an organization needs to quickly convert all its code to parameterized queries a tool like this with regular expression patterns defined by comparing the organizations standard legacy code structure would be very useful.


In this paper we described the SQL injection vulnerability and ways to mitigate the issue practically by exploiting the security benefits of parameterized queries. The paper described a tool which could assist the developer in successfully converting the SQL statements to parameterized queries with minimal manual intervention. This study though done specifically focusing .Net framework is virtually applicable to all kinds of programming environment. Further the study showed that such a tool carefully designed would be very helpful for the developer and can mitigate the SQLIV to a large extent. 

A* search

I took this class called Introduction to AI, and the professor gave an assignment to write an A* hueristic program to find the shortest path between the cities. The shortest path heuristic was based on the euclidean distance or alternatively total number of links. The program needed to be shown on a graphical interface. I wrote this taking some time, going through all sorts of implementation issues. I assume my solution to be not the optimal, but it works in reasonable time. Below is the video describing one of the sample run.

Sample locations input file

Sample Connections input  file

Main function

Probabilistic Road-map method (PRM)

Probabilistic road-map approach is used in robotics to find a path from the present configuration to the robots desired configuration. This is done by finding a path in the configuration space.

As shown above the path from the source to destination is plotted on the configuration space. I used the classical Dijkstra method to find the path. The sample input file and the main program are below.


1 8
-7 20.5
-11 30.5
-18 16
-21.5 28
-7 25.5
-11 35.5
-18 21
-7 33
1 2 3 4
5 6 7 8
0 0
30 55
300 150 
76 91

Configuration Space

I took a class called Algorithms for Robotics in the school. The class was interesting, except that I messed up the finals. Anyway, the assignments included finding the configuration space of a 2D robotic arm, Simultaneous Localization & Mapping etc. I thought of writing something about it here.

The configuration space of a robot allows it to calculate the possible motions it can take, given the locations of the obstacles.  The above picture represents the two angles of a 2D arm robot. Each goes from 0 to 360 degrees. The red shade indicates obstacles and the white space indicates the free space. The real world co-ordinates are translated by the algorithm in to the above representation to make things much easier. I wrote the program in Java. It was interesting solution, since I did not use any open source libraries.

Sample input

3 11
0 0
0 1
0 2
0 3
1 0
1 1
1 2
1 3
3 2
3 3
4 3
1 2 6 5
3 4 8 7
9 10 11
3 0
2 2

Program link

Saturday, June 9, 2012

Theme park

Roller coasters are so much fun! It seems like everybody who visits the theme park wants to ride the roller coaster. Some people go alone; other people go in groups, and don't want to board the roller coaster unless they can all go together. And everyone who rides the roller coaster wants to ride again. A ride costs 1 Euro per person; your job is to figure out how much money the roller coaster will make today.

The roller coaster can hold k people at once. People queue for it in groups. Groups board the roller coaster, one at a time, until there are no more groups left or there is no room for the next group; then the roller coaster goes, whether it's full or not. Once the ride is over, all of its passengers re-queue in the same order. The roller coaster will run R times in a day.

For example, suppose R=4, k=6, and there are four groups of people with sizes: 1, 4, 2, 1. The first time the roller coaster goes, the first two groups [1, 4] will ride, leaving an empty seat (the group of 2 won't fit, and the group of 1 can't go ahead of them). Then they'll go to the back of the queue, which now looks like 2, 1, 1, 4. The second time, the coaster will hold 4 people: [2, 1, 1]. Now the queue looks like 4, 2, 1, 1. The third time, it will hold 6 people: [4, 2]. Now the queue looks like [1, 1, 4, 2]. Finally, it will hold 6 people: [1, 1, 4]. The roller coaster has made a total of 21 Euros!


The first line of the input gives the number of test cases, T. T test cases follow, with each test case consisting of two lines. The first line contains three space-separated integers:R, k and N. The second line contains N space-separated integers gi, each of which is the size of a group that wants to ride. g0 is the size of the first group, g1 is the size of the second group, etc.


For each test case, output one line containing "Case #x: y", where x is the case number (starting from 1) and y is the number of Euros made by the roller coaster.


1 ≤ T ≤ 50.
gi ≤ k.
Small dataset

1 ≤ R ≤ 1000.
1 ≤ k ≤ 100.
1 ≤ N ≤ 10.
1 ≤ gi ≤ 10.
Large dataset

1 ≤ R ≤ 108.
1 ≤ k ≤ 109.
1 ≤ N ≤ 1000.
1 ≤ gi ≤ 107.


4 6 4
1 4 2 1
100 10 1
5 5 10
2 4 2 3 4 2 1 2 1 3
Case #1: 21
Case #2: 100
Case #3: 20

My Solution:

Snapper Chain

The Snapper is a clever little device that, on one side, plugs its input plug into an output socket, and, on the other side, exposes an output socket for plugging in a light or other device.
When a Snapper is in the ON state and is receiving power from its input plug, then the device connected to its output socket is receiving power as well. When you snap your fingers -- making a clicking sound -- any Snapper receiving power at the time of the snap toggles between the ON and OFF states.

In hopes of destroying the universe by means of a singularity, I have purchased N Snapperdevices and chained them together by plugging the first one into a power socket, the second one into the first one, and so on. The light is plugged into the Nth Snapper.

Initially, all the Snappers are in the OFF state, so only the first one is receiving power from the socket, and the light is off. I snap my fingers once, which toggles the first Snapperinto the ON state and gives power to the second one. I snap my fingers again, which toggles both Snappers and then promptly cuts power off from the second one, leaving it in the ON state, but with no power. I snap my fingers the third time, which toggles the firstSnapper again and gives power to the second one. Now both Snappers are in the ON state, and if my light is plugged into the second Snapper it will be on.

I keep doing this for hours. Will the light be on or off after I have snapped my fingers Ktimes? The light is on if and only if it's receiving power from the Snapper it's plugged into.


The first line of the input gives the number of test cases, T. T lines follow. Each one contains two integers, N and K.
1 ≤ T ≤ 10,000.
Small dataset

For each test case, output one line containing "Case #x: y", where x is the case number (starting from 1) and y is either "ON" or "OFF", indicating the state of the light bulb.
1 ≤ N ≤ 10;
0 ≤ K ≤ 100;
1 ≤ N ≤ 30;
0 ≤ K ≤ 108;


1 0
1 1
4 0
4 47
Case #1: OFF
Case #2: ON
Case #3: OFF
Case #4: ON

My Solution