12 May, 2009

Access Violation in details

Exception of class EAccessViolation is most common error in Delphi applications. Today I want to discuss it, its reasons and how to deal with it. This article is mostly for beginners, so it can contain not very accurate phrases.

What is an Access Violation

Every computer program uses memory for running (*). Memory is consumed by every variable in your program. It can be form, component, object, array, record, string or simple integer. Memory can be allocated automatically for certain types of variables (such as integer or static arrays), the other types require manual control of memory (for example, dynamic arrays). Essentially, from the point of operating system, each variable is characterized by its address (i.e. - location) and size.

Roughly speaking, program uses 3 "types" of memory: area for global variables, the stack and the heap.

Memory for global variables is allocated by OS loader when executable module is loading and it is freed when module is unloading. Global variables are those, which declared outside of class or any routine. The stack is used for allocating memory for local variables (which are declared in some function or procedure) and auxiliary data (such as return addresses or exception handlers). The heap is used for storing dynamic data.

Note, that for variables of dynamic types (such as dynamic arrays, strings, objects or components) - though the variable itself is stored in global area or stack, but its data is always allocated on the heap and it (often) require manual control.

Regardless of who allocates memory for the variable (you, manually or the compiler, automatically), memory for each variable must be allocated before its using, and later (when the variable is no longer needed) it should be freed.

Sometimes there can be a situation, where your application trying to get access to certain memory location, which wasn't allocated or was already released - due to bugs in your code. When such things happens - the CPU raises an exception of class EAccessViolation. The usual text for this error is as follows: "Access violation at address XXX in module 'YYY'. Write/read of address ZZZ". Though there is the one simple reason for this kind of error, the real situations for it can be very different.

Looking for source code line of Access Violation

So, what should you do with access violation? Well, first you should try to identificate a source line in your code, where it appears.

If you are getting EAccessViolation while running under debugger:


Then you should just click "Break" (it is called "Ok" in older Delphi's versions) and the debugger will point you to source line immediately. Additionally you can take a look at call stack by choosing View/Debug Windows/Call stack from Delphi's main menu:


This window shows you a call stack - the trace of executing to current code's point. You should read this from top to bottom. The current location is marked by little blue arrow. You can also double-click on line to go to a particular location.

If you are using an exception diagnostic tool, such as EurekaLog, then there would be a bug-report instead of usual error message. You can see a call stack in the report (call stack view can differ due to different building algorithm):


It doesn't matter if you got an error by using debugger or EurekaLog - it is best to prepare for this situation in good time - by setting the proper project's options. Typically, this is the options "Use Debug DCUs" and "Stack frames".

Okay, finding the error's location - this is only half of the case. Determinating why there is an error in this line - it is the second half of the case.

Looking for the Access Violation's reason by analyzing the code

If you got an error while using debugger, then it is quite simple - you should place a breakpoint to your problem-line and check all variables and expressions in this line after breakpoint's hit - and here it is, the reason for access violation. I won't discuss the using of debugger here, rather I want to discuss other approaches.

If there is only a bug-report - then you should use your telepathic abilities to find out the truth. Those psychic powers are comes with experience and I want to help you a little with it by giving you a list of most common mistakes, which can lead to EAccessViolation exceptions.

1. First, there are all kinds of errors of accessing an array's element outside of its borders. For example, the typical newbie's mistake can look like this:
var
  X: Integer;
...
  for X := 1 to Length(List) do // wrong! Should be: for X := 0 to Length(List) - 1 do
  begin
    // ... do something with List[X]
  end;
So, if your problem line contains [] - there is a good reason to validate your expression inside [].

Usually, you should catch errors of this sort at development/testing stage by using "Range Check Errors" option. The point is that such errors are very dangerous, because they may go unnoticed, even more than that - they can destroy the stack, so that you can not get the location of the error. But more on this later.

2. All kinds of messing with arguments. I mean here cases with untyped parameters and buffer-overflow errors:
var
  S1: array of Integer;
  S2: String;
...
  // Wrong:
  Stream.ReadBuffer(S1, 256);     // this corrupts the S1 pointer
  // Correct:
  Stream.ReadBuffer(S1[0], 256);  // this reads data into S1 array

  // Wrong:
  FillChar(S2, Length(S2), 0);            // this damages the S2 pointer
  // Correct:
  FillChar(Pointer(S2)^, Length(S2), 0);  // this clears the S2 string by filling it with zeroes
Usually these errors are catched immediately upon function call. You should just examine a function's documentation to figure out what you did wrong. Check: what function expects to receive and what actually you give to it.

3. Passing data between modules. Well, newbies likes to pass data (especially String) between exe and DLL, without caring much about two different memory managers in modules. I won't cover this issue here, as it might take a long time.

These errors are usually detected at development time.

4. Wrong declaration of functions, which are imported from DLL. The most common mistake is wrong calling convention. If you are getting EAccessViolation just by calling a function from DLL - just carefully verify its declaration. Be sure, that its signature is correct and you didn't forget about stdcall or cdecl.

Though these errors usually detected at development stage, there can be cases, when wrong declaration will make it at production code. Here is a good story about such case by Raymond Chen.

5. Missing of proper synchronization, when working with threads. If you are using more than one thread in your application, then there can be troubles. For example, you can not access a VCL objects from another thread as VCL is not thread-safe - you should use Synchronize for this. Actually, the problem is encountered when one thread changes the data, which is used by another thread - and that becomes a complete surprise for the second thread.

Unfortunately, the problems with thread are the most complex ones. They are very hard to diagnose. The best you can do is to guarantee, that such things can not happen. If you are in doubt - place you code in synchronize or guard it by critical section, when working with shared variables. Sometimes programmer uses CreateThread instead of BeginThread or TThread and forgets about changing IsMultiThreaded.

6. Calling a function via invalid procedural variable. For example:
var
  Lib1, Lib2: HMODULE;
  Proc: procedure;
...
  Lib1 := LoadLibrary('MyDll.dll');         // one piece of code loads DLL. It can be in different thread
...
  Lib2 := GetModuleHandle('MyDll.dll');
  Proc := GetProcAddress(Lib2, 'MyProc');   // there is no checks! There can be no function named 'MyProc'
  Proc;                                     // Proc can be = nil -> there will be an Access Violation
...
  FreeLibrary(Lib1);                        // some code unloads library
...
  Proc;                                     // though Proc <> nil, its code is no longer available
                                            // that is why there will be an AV.
The whole case is very similar to the next situation.

7. Calling of methods or any other access of objects/components, which wasn't created yet or were already released. You should consider this reason if there is some object variables in your problem line of code. Especially, if you do a manual allocate or free of objects somewhere in your program.

The one part of the problem is that when you destroy an object, its variable is not cleared automatically - it continues to point at invalid memory location. The other part is that local variables are not initialized to zero and contains trash at function's call. The last part: there can be multiply reference to one object/component via different variables. Here are few examples:
var
  Str: TStringList;
...
  Str.Add('S'); // Mistake! We forget to create an object by calling Str := TStringList.Create;
...
  Str := TStringList.Create;
  Str.Add('S');
...
  Str.Free; // We destroyed the object, but the Str still points to old location
...
  if Str.Count > 0 then // Mistake! An access to already released object
All such memory access errors are dangerous as they may be unnoticed. For example, we can access a deleted object, but our memory manager still wasn't return memory to the system, so our access can be successful. We already talked about such situations before.

The situation with local arrays is even worse: the point is that local arrays are allocated in the stack, so there is large areas of available memory at its borders. To make things worse: this memory is heavily used by application (as oppose to the memory, which were released by the object destruction).

For example:
procedure TForm13.Button1Click(Sender: TObject);
var
  S: array [0..1] of Integer;
  I: Integer;
begin
  I := 2;            // suppose, that I is somehow calculated in you application
                     // and suppose that there is a bug, and I gets wrong value.
  S[I] := 0;         // this line will damage the return address of Button1Click in the stack
end;                 // there will be EAccessViolation at this line, because the address of the caller is lost

procedure TForm13.Button2Click(Sender: TObject);
var
  S: array [0..1] of Integer;
  I: Integer;
begin
  I := -6;          // suppose, there is another wrong value.
  try
    S[I]     := 1;  // instead of changing an array, we damages an exception handler frame, which was set by try
    S[I + 1] := 2;
    S[I + 2] := 3;
    Abort;          // there would be a full crash, without any message. 
                    // The exception manager detect a damaged stack and will terminate application immediately
  except
    ShowMessage('Aborted');
  end;
end;

procedure TForm13.Button3Click(Sender: TObject);
var
  S: array [0..1] of Integer;
  I: Integer;
begin
  I := -1;          // yes, another invalid value for I
  S[I] := 1;        // we damages the stack again, but there won't be any EAccessViolation or side effect!
end;
It is very treacherous situation, isn't it? Depending on how we messed up with the array's index, we can get (**):
a). Application, which produces the correct results.
b). Application, which produces the wrong results.
c). Application, which raises an exception.
d). Application, which crashes.
To make things worse: the very same application can display any of the above behavior, depending on external conditions, such as OS and Delphi's version, user actions before error and so on.

That is why it is extremely important to use "Range Check Errors" option while you develop and testing your application.

Well, you can also enable it for production code, if you isn't sure that your testing was good enough.

So what exactly should we do with access violation? Well, we have a source line, so we should just look through above mentioned cases and try to apply them to our line of code:
  • Do we have the [] in our line? If so: can there be an invalid index here?
  • Are there any work with objects? If so: check the logic - is there a too early object's release?
  • Do we use a DLL? If so: is a function declaration correct? Does all dynamic data exchanges properly handle?
  • and so on.

There can be a great help if we can also use few hints from the data.

Looking for Access Violation's reason by analyzing the data

First, we can retrieve some useful information from error's message itself. Let's remember it:

Access violation at address XXX in module 'YYY'. Write/read of address ZZZ.

Okay, the address XXX points to exact location of code, where exception was raised. This is the same address, which is used by Delphi's debugger and EurekaLog to point you to your line of code. The executable module for this address is also displayed in the error message - as YYY. Usually it is your exe, DLL or some system/third-party DLL. Sometimes, however, there can be cases when XXX do not hold any meaningful value. For example, if there is no YYY in the message of if XXX looks suspicious (less then $400000 or greater than $7FFFFFFF on x86-32), then you definitely have problems either with stack corruption (for example, "c" item from the previous entry), of call of invalid function (item 6 or, sometimes, 4 from previous entry).

The next useful piece of information is "write" or "read" word. The "write" means that the exception occurred during writing, the "read" means that, well, the problem while reading (quite obvious, isn't it?). That means, that we only need to check write or read parts in the problem source line. For example, if the problem line is "P := W" then we should check P if there was "write" word and check W if there was "read" word in the error's message.

And the last hint comes from ZZZ. Actually, we do not care about exact value, but rather about if it is small or large. "Small values" are something like $00000000, $0000000A or $00000010. The "large values" are, for example, $00563F6A, $705D7800 and so on. So, if ZZZ is small - then your code tried to access an object via nil reference. If ZZZ is large - then your code tried to access an object via non-nil invalid pointer. In the first case you should check: why do you try to use nil pointer (or who is the bad guy, who set pointer to nil). In the second case you should search for bad guy, who released the object, but doesn't clear the variable itself.

Apart from error's message, there can be another information, which comes from assembly and CPU tabs in EurekaLog's bug-report:



You can see the assembly listing of your program on the first tab. It is provided here only for convenience - that way you do not have to search it somewhere else. This is no information there. But on the second tab - you can see the status of CPU's registers, (part of) the stack and (part of) the memory at the moment of exception raising. In this case, we can look at the assembler listing and see that the problem involves eax and edx registers. We can check that eax is 0 on CPU tab, which means that we are trying to assign value via nil pointer. Then we take a look at the line of source code, which we learned from the call stack, and we will know the name of the variable. And here's the reason for you: the variable, used in assignment, was = nil.

Of course, to work with this information you need a minimum knowledge of assembler, but it is a quite powerful tool.

In the next time, we'll talk about cases, when there IS a bug in your code, but there is no access violation! Partially, we already talked about such situations (like silent stack corruption), but in the next time we'll be specifically focused on them and will consider what can we do to catch such errors.

Remarks:
(*) There is a very good explanation of memory for application by Mark Russinovich.
(**) Here is another example of how the very same code can reveal the very broad behavior. Unfortunately, this example isn't for Delphi, but here is the same example, adapted for Delphi (I'm sorry, this is auto-translation - the original post is in russian).