Improving on Code Quality
Loosely typed language
In FoxPro the content of variables does have a type, unlike some other languages that treat everything as strings. However, you don't have to declare the type of a variable, nor the variable itself which makes VFP a loosely typed language. When using a variable without declaring it first, Visual FoxPro creates it as a PRIVATE variable. Once you created a variable, you can assign it data of any type.
This flexibility has a number of advantages as it allows for more flexible application design in many cases. In fact, right now there's an increasing interest in languages that behave in the same way as FoxPro does such as Python, Perl and JavaScript. With regard to variables, Visual FoxPro is the most flexible one, allowing statements like:
LOCAL (SYS(2015))
which creates a local variable with a random name. While I still haven't found the need for code like this in a real-world application, this nicely demonstrates the power of interpreted, loosely typed languages.
Loosely typed languages make it easy to write buggy code
Nonetheless, loosely typed languages have one major drawback that is responsible for the bad reputation they got in the development community in general. They make it extremely easy to write buggy code. Visual FoxPro offers an option to provide kind of a simulation of strictly typed languages:
_VFP.LanguageOptions = 1
With this option, Visual FoxPro produces a log entry every time it encounters a variable that is created without being declared in a LOCAL, PUBLIC, LPARAMETER or PARAMETER statement. However, for this to work, you need to execute your application. LanguageOptions only finds undeclared variables in code that you executed. In a real world application it's virtually impossible to get 100% coverage with regular execution since much code deals with error conditions, and the like.
Code Analysis
If you want to detect undeclared variables without running the application and be sure that you checked all code, you need additional tools that analyze the code. Code analysis is possible in two ways. All tools that I know for Visual FoxPro analyze the source code. For Visual FoxPro developers that's a very intuitive approach. If you write a tool that analyzes your own code, this tool can be pretty small.
Once you need to analyze code from an entire team or even other developers, code complexity keeps increasing. There are a couple of things you have to be aware of when analyzing source code:
- Developers can indent code with tabs, blanks and any combination. Both characters can also be used in between clauses of a single line. You can separate the keyword LOCAL from a variable by either a blank or a tab.
- Lines can be broken into multiple lines by ending a line with a semicolon. However, after the semicolon you can add comments using &&. In other words, when you merge multiple lines, you first have to remove any comments at the end of the line, and then look for a semicolon to merge lines. When VFP merges two lines it replaces all blanks and tabs between the two lines with a single blank.
- Keywords can be written in any case and with a reduced number of characters. LPARAMETERS can be written as lpar, LParam, LParameter, lParameter, LPARAmet, and so forth. In earlier versions the first four characters where unique. That changed over the years, though. You need tables of commands and clauses to expand the reduced version. These tables must be in the right order so that LOCA is expanded as LOCATE instead of LOCAL.
- Visual FoxPro supports syntax variations. For example, the following three code lines are all identical:
_Screen.Caption = 'Hello world'
_Screen->Caption = "Hello world"
_Screen .;
Caption = [Hello world]
- You have three string delimiters in FoxPro. To separate objects from properties, you can use a period or an arrow. Between the object, the delimiter and the property, there can be any number of tabs, blanks, even line breaks and comments.
- Looking at a single line isn't sufficient. Most parsing algorithms for Visual FoxPro fail on the TEXT…ENDTEXT command. Not only do they treat anything between the two statements as code, they usually completely fail on code like this:
Text to lcString
Procedure Test
Return ;
Endtext
? lcString
- If the parser attempts to normalize lines before analyzing them for content, the parser will treat the third and fourth line as a single line ignoring the ENDTEXT statement. VFP however, properly detects ENDTEXT and prints the two lines between TEXT and ENDTEXT.
- Parsing in general is not easy. To obtain a list of variables in a LOCAL statement, you might be tempted to use the following code after normalizing the line:
ALINES( laVariables, m.lcLine, 0, ",")
FOR EACH lcVar IN laVariables
? GETWORDNUM(m.lcVar,1)
ENDFOR
This works fine until you meet a developer that used code like this:
LOCAL laArray[ALEN(laAnotherArray,1)]
Suddenly the analyzing routine reports two variables "LAARRAY" and "1". To properly parse code you have to implement a minimal expression parser. While it doesn't have to actually evaluate the expression, it needs to detect and keep track of all characters that nest expressions. This include parenthesis ("[ ]", "( )") as well as string delimiters (" ", ' ', [ ]).
- In some case the meaning of FoxPro code can change in a subtle way as the following sample demonstrates:
a.test && always a field named "test"
m.test && always a memory variable named "test"
q.test && can be a memory variable or a field
The structure looks the same in all cases, a single character followed by a period followed by a name. However, "A" to "J" are reserved keywords for the first ten work areas, "M" is a reserved keyword for memory variables whereas "Q" is just a variable or alias name.
- Finally, the preprocessor can get in your way. Visual FoxPro doesn't limit the preprocessor to constants that are used instead of variables or values. Rather, you can use it to redefine function names, commands, and so on. To parse source code the way Visual FoxPro does, you also need to evaluate all include files, including nested header files.
With all these issues why do we still analyze source code so frequently? The biggest reason is that the format of the source code is immediately obvious. You see the target and can figure out ways to tackle the issue. The problems I outlined above raise their ugly head much later in the development cycle when you throw different source files at your code.
Parse compiled code
There is an alternative, though. Instead of parsing the source, you can just as well parse the compiled code, that is, the FXP file. Using the FXP file for code analysis has got a number of advantages:
- Compiled code is completely verified and hence valid.
- Compiled code is smaller than source code. Parsing is therefore faster.
- It's easy to parse an FXP file (once you know the structure)
The FXP format is a binary file format. Visual FoxPro isn't particularly suited for parsing binary files, nor is it part of the day to day work of Visual FoxPro developers. Aside from this the most likely reason why FXP files are seldom used for parsing is that the FXP format is undocumented.
I can't do anything about Visual FoxPro's feature set for dealing with binary files, nor can I change your desire to deal with binary data. However, I can do something about the lack of documentation. The FXP file consists of a number of blocks shown in the following table.
| 0x00 – 0x02 |
File signature |
| 0x03 – 0x1B |
File header |
| 0x1C – 0x4D |
Code header |
| |
Code blocks. Starts with the code for the main program in the PRG file followed by all procedures and methods. |
| |
Procedure headers |
| |
Class construction code |
| |
Class definition table |
| |
Line length information block |
| |
File name block |
| |
File footer |
Since there's no official documentation of the FXP format, I had to figure out the format myself. This means that not everything presented in this article ‘must’ be completely correct.
For analyzing procedures there are two main parts that are of interest. The procedure header contains details about a procedure:
| WORD |
Length of the following procedure name. |
| variable |
Name of the procedure. This name is not terminated by a CHR(0). |
| LONG |
Position of the compiled procedure code. To get the position within the file, you need to add 0x29. |
| WORD |
unknown |
| WORD |
Class number to which this procedure belongs. Class numbers start at 0. Procedures that do not belong to a class have a class number of 0xFFFF or -1. |
Studying the FXP format reveals a couple of oddities. For instance, Visual FoxPro doesn't store all methods of a class with the class. Rather all methods in all classes, all remaining procedures and the main program itself are stored in one huge procedure block. The class number identifies to which class a procedure belongs. As the class number is a 16 bit value and 0xFFFF is reserved for procedure outside a class, this gives you a maximum of 65,535 classes per PRG file.
Studying the FXP format reveals a couple of oddities
The length of a procedure name is another 16 bit value. While Visual FoxPro limits us to 127 characters for a procedure name, it could technically deal with much longer procedure names up to over 65,000 characters. The most important piece of information in the procedure header, though, is the position of the code block.
Procedure headers are stored somewhere buried into the FXP file. To locate them, you use two pieces that are stored in the code header:
| 0x1D |
WORD |
number of procedures in the file. |
| 0x15 |
WORD |
Position of the procedure headers. To obtain the file position add 0x29. |
| 0x2A |
LONG |
File time stamp of the PRG file using the FAT format. |
If you plan ahead a very long time, one part of the code header might concern you. The time stamp for PRG files uses the same format as the file time stamp in the FAT16 file system. Aside from only being precise down to a two second interval it also only supports dates up 12/31/2107. In just a little over hundred years Visual FoxPro won't be able to detect changes in a PRG file automatically!
The code header shows another of these oddities in Visual FoxPro. The number of procedures is stored as a 16 bit value. Hence, you can only store 65,535 procedures in a single PRG file. This doesn't only include traditional procedures, but also methods defined in a class. To reach the maximum number of classes of 65,535, each class can only have a single method.
Once you obtained the position of the procedure header and the number of procedures, you can start reading the procedures headers and add them into a collection. The following code performs this task:
*=======================================================
* Reads positions of all procedures into a collection
*=======================================================
Procedure LoadFXP
Fseek( This.nHandle, 0x1C+0x0D )
This.nProcCount = This.ReadInt16()
This.GetProcedureHeaders()
EndProc
*=======================================================
* Read all procedure headers
*=======================================================
Procedure GetProcedureHeaders()
Dimension This.aProcList[This.nProcCount+1]
*------------------------------------------------------
* Create an entry for the main procedure list.
*------------------------------------------------------
Local loItem
loItem = CreateObject("ProcListEntry")
This.aProcList[1] = m.loItem
loItem.nClassID = 0xFFFF
loItem.nPosition = 0x4E
loItem.cName = ""
*------------------------------------------------------
* Create a list of procedure names
*------------------------------------------------------
Local lnPosition, lnItem, lnLength
Fseek( This.nHandle, 0x1C+0x15 )
lnPosition = This.ReadInt32()
If m.lnPosition > 0
Fseek( This.nHandle, m.lnPosition+0x29 )
For lnItem = 1 to This.nProcCount
lnLength = This.ReadInt16()
loItem = CreateObject("ProcListEntry")
This.aProcList[m.lnItem+1] = m.loItem
loItem.cName = Upper(Fread(This.nHandle,m.lnLength))
loItem.nPosition = This.ReadInt32() + 0x29
Fseek( This.nHandle, 2, 1 )
loItem.nClassID = This.ReadInt16()
EndFor
EndIf
EndProc
A procedure header points to the compiled code. Compiled code is organized in code blocks. Each code block contains the code of one compiled procedure:
| WORD |
Length of the following code line area. The following two entries are repeated for every line. |
| WORD |
Length of the following line including these two bytes. Every source code line with code is compiled into one line of tokenized code. Source code lines that do not contain executable code are completely left out the code line area. |
| BYTE |
FoxPro command. Every command is uniquely identified by a byte token. WORD Number of entries in the following name list. The following two entries are repeated for every name table list entry. |
| WORD |
Length of the following name. |
| variable |
Name of a variable, function, etc. The name is not terminated with CHR(0). |
A code block consists of two parts. The first part is the compiled procedure in tokenized code. There's one entry for every line of executable code. The second part is the name table list. To save memory and increase execution speed, Visual FoxPro doesn't store the name of a variable in the tokenized code. Rather it creates a list of all variable, method, property, function, etc. names. Each of them is assigned a 16 bit index value.
If you have more than 65,535 names in a single procedure, Visual FoxPro behaves erratically without warning at compile time. Hence, if you ever come close to that number of names for a single procedure, you should consider splitting the code into multiple procedures.
To parse a program for variables that are used before they are declared, you need one more piece of information. Each line starts with a single byte token that identifies the command. A variable should occur in a LOCAL, LPARAMETER, PARAMETERS, PRIVATE or PUBLIC statement before it is used in a variable assignment.
| 0x34 |
PARAMETERS |
| 0x35 |
PRIVATE |
| 0x37 |
PUBLIC |
| 0x54 |
variable assignment |
| 0xAE |
LOCAL |
| 0xAF |
LPARAMETERS |
The parser
Parsing an entire program can be broken up into two steps. First of all you loop through all procedures:
*-------------------------------------------------------
* Scan through all procedures
*------------------------------------------------------
Local lnProc, lnSize, lcName
For lnProc = 1 to loFXP.nProcCount+1
lnSize = loFXP.GotoProcedure( m.lnProc )
If lnSize > 0
lcName = loFXP.GetName(m.lnProc)
If Empty(m.lcName)
lcName = JustFname(m.tcProgram)
EndIf
LogLine( "Analyzing " + m.lcName )
AnalyzeProc( m.loFXP, m.lnSize, m.lcName )
EndIf
EndFor
In the second step you loop through all lines within a procedure. A huge CASE statement processes each different command token. For space reasons the following code only deals with LOCAL and variable assignments.
*=======================================================
* Reads a procedure looking for the following issues:
*
* - Variables that are assigned values to without being
* declared
*
*=======================================================
Procedure AnalyzeProc( toReader, tnSize, tcName )
*------------------------------------------------------
* In various arrays we keep track of the status
*------------------------------------------------------
Local laDeclared[65000], laUsedBeforeDeclaration[65000]
laDeclared = ""
laUsedBeforeDeclaration = .F.
*------------------------------------------------------
* Parse the procedure
*------------------------------------------------------
Local lnReadSoFar, lnLength, lnCmd
lnReadSoFar = 0
Do while lnReadSoFar < m.tnSize
lnLength = toReader.ReadInt16() - 2
m.lnReadSoFar = m.lnReadSoFar + m.lnLength + 2
toReader.SetReaderLimit( m.lnLength )
lnCmd = toReader.ReadInt8()
Do case
*-----------------------------------------------------
* LET: A variable is assigned a value. This variable
* should have been declared before.
*-----------------------------------------------------
Case m.lnCmd == 0x54
Local loVar, loValue
loVar = toReader.GetExpression()
toReader.ReadInt8()
loValue = toReader.GetExpression()
Local lnCode, lnIndex, lnWorkArea
Do case
Case loVar.Count == 1
If loVar[1].cType == "variable"
If Empty(laDeclared[loVar[1].uValue])
laUsedBeforeDeclaration[m.loVar[1].uValue] = .T.
EndIf
EndIf
Case loVar.Count == 2
If loVar[1].cType == "workarea" ;
and loVar[1].uValue == 0x0D ;
and loVar[2].cType == "variable"
If Empty(laDeclared[loVar[2].uValue])
laUsedBeforeDeclaration[m.loVar[2].uValue] = .T.
EndIf
EndIf
EndCase
*-----------------------------------------------------
* LOCAL
*-----------------------------------------------------
Case m.lnCmd == 0xAE
Local lnNextToken, loExp as Collection, lnSubToken
lnNextToken = 0x07 && Comma
Do while m.lnNextToken != 0xFE && CmdEnd
Do case
Case m.lnNextToken == 0x07 && Comma
loExp = toReader.GetExpression()
Do case
Case loExp.Count == 1
Do case
Case loExp[1].cType == "variable"
laDeclared[loExp[1].uValue] = "L"
Case loExp[1].cType == "array"
laDeclared[loExp[1].uValue] = "L"
lnSubToken = 0x07 && Comma
Do While m.lnSubToken == 0x07
toReader.GetExpression()
lnSubToken = toReader.ReadInt8()
EndDo
EndCase
Case loExp.Count == 2
Do case
case loExp[1].cType == "workarea" ;
and loExp[1].uValue == 0x0D ;
and loExp[2].cType == "variable"
laDeclared[loExp[1].uValue] = "L"
case loExp[1].cType == "workarea" ;
and loExp[1].uValue == 0x0D ;
and loExp[2].cType == "array"
laDeclared[loExp[2].uValue] = "L"
lnSubToken = 0x07 && Comma
Do While m.lnSubToken == 0x07
toReader.GetExpression()
lnSubToken = toReader.ReadInt8()
EndDo
EndCase
EndCase
Case m.lnNextToken == 0x51 && AS
toReader.GetExpression()
Otherwise
Exit
EndCase
lnNextToken = toReader.ReadInt8()
EndDo
EndCase
toReader.ClearReaderLimit()
EndDo
*------------------------------------------------------
* After parsing the entire procedure, we can read the
* name list.
*------------------------------------------------------
Local loNames as Collection, lnNames, lnCurName
Local lnLength
loNames = CreateObject("Collection")
lnNames = toReader.ReadInt16()
For lnCurName=1 to m.lnNames
lnLength = toReader.ReadInt16()
loNames.Add( toReader.ReadBytes(m.lnLength) )
EndFor
*------------------------------------------------------
* Print the results
*------------------------------------------------------
Local lnVar
For lnVar=1 to Alen(laUsedBeforeDeclaration)
If laUsedBeforeDeclaration[m.lnVar]
LogLine( " "+loNames[m.lnVar]+ ;
" used before it's declared." )
EndIf
EndFor
EndProc
In each line the token is followed by tokenized expressions. LOCAL, for instance, is followed by the name of a variable. There are various possibilities for specifying a variable name:
LOCAL m.lcName
LOCAL lcName
LOCAL laName[10]
LOCAL m.laName[10]
The code above deals with every one of these possibilities. The variable name can be followed by the "AS" token which is followed by a class name. Alternatively, a comma token (0x07) starts the next variable name. The code above omits the GetExpression() method which returns a collection with details about an expression. To analyze a program, call CodeQuality.prg passing the name of the FXP file like this:
DO CodeQuality WITH "myProg.FXP"
The result is written into a text file. Currently, this utility only supports the detection of undeclared variables. However, it's a foundation that you can use to write your own validation code based on the FXP format.