Source code analysis to build architecture repo

Did you know that, if you are analyzing a program code, then you can visualize and create the relationship diagram of the program? You can show, which application is connected to the others, and the internal calling graph of the given software.

To utilize this knowledge, you need to learn how to analyze a real program code properly.

In the course of the analysis the goal is to prepare a general process. It is very important to have a general data format, which can be created from almost every popular language.

The experiences shows, that we need to find an existing code analyzing tool with the conditions mentioned above. This approach help utilizing the code interpretation investments other made, instead of starting the game from the scratch.

In the current case we will use the Doxygen code analyzing tool. It is capable to generate a documentation from annotated C++ sources, but it also supports other popular programming languages such as C, Objective-C, C#, PHP, Java, Python, IDL (Corba, Microsoft, and UNO/OpenOffice flavors), Fortran, VHDL and to some extent D. Doxygen can generate multiple form of documentation. Its output can consist of multiple HMTL, XML and RTF files too, and we have the option to convert the RTF file to a PDF. So, it is very useful to reach our original goal. Furthermore, we can create a complete owner’s manual with Doxygen if it is needed. If you want to learn more about Doxygen you can find the tool here with a complete manual.

Now, as an example we will work with a simple C# project. If we want to analyze a C# code, we have to select this option in the settings. You can set multiple type of settings in the Doxywizard menu, but we will talk about them later. Beyond the C# setting, we have to select, the XML output option, because we will utilize it in the followings. Furthermore, we need diagrams which visualize the connections between classes. We have to check all the output files after running the Doxywizard.

 

Original program hierachy

index.html result

generated XML files

First of all, we will get a HTML file, with the name of index.html, open this, and you can check the recognized details of your program code. You will be seeing all the classes and some objects too.

After the successful doxygen execution, we can start the basic analysis. To find the calling patterns in the generated XML code, we should open our original C# program too and search in a specific interfacing code in any chosen class. We should open this class in our C# code to find this class between all the generated XML files: it has the same name as the original class. Now, we can search for the code line in the source file of the class where the call is. We should search for that row in the generated XML file too, by looking for a specific key word which indicates a call. So, when we have that key word, we can just search for the remaining calls.

To ensure the precision and minimalize the possibility of errors we need to explore the index.html and all the remaining XML files in parallel. We should analyze all the files at the same time, especially looking for the calling graphs in the index.html. I suggest reading the XML files in a text editor, like Notepdad++.

 

The specific settings of the current example are listed below:

All the attributes, which are not mentioned here, are left to default.

Topics:

  • Project: You should select your program code directory and select scan recursively.
  • Mode: In my case, I chose “Optimize for Java or C# output” option
  • Output: You should select, the XML output format
  • Diagrams: for diagrams, it is recommended to generate all diagrams

Expert:

  • Build: In this current situation I selected, the following options: EXTRACT_PRIVATE; EXTRACT_VIRTUA; EXTRACT_STATIC; INTERNAL_DOCS
  • XML: GENERATE_XML is needed (important!)

With these settings doxygen is ready to execute.

Thanks to all the properly generated XML files, we hopefully will found the key word which represents the method calling in the XML. In our case the keyword is the “ref” parameter. When a call happens, we can always find this in the generated XML files. So, just search for “ref” keyword in the whole set of XMLs, generated for all of the classes.

Now, as we explored the details, we can step to our next goal to visualize the calling graph between the classes and objects.

First of all, we need a tool, where we can represent all the classes with the objects and methods in it. Furthermore, we have to prepare for all the relationship lines too. Here we will use SAMU, which is capable to handle multiple data structures and types, If you would like to know more about SAMU you can look for this article too.  The words of object type, relationship, selection, diagram are following SAMU notation below.

So, at the first time we create three object types. The “class” object type, which is used to upload all the classes from the project.

Now the import data from Excel helps to go fast.

After loading all data, we grabbed out of the code by doxygen, we can step further to generate a diagram.

Diagrams, which are one of the kind of report is SAMU are capable to visualize the data stored in the database. Here we can represent all the classes, objects, methods and relationships as well.

In my opinion, we should visualize all the data types trustworthily to the original program code. The only problem with this visualization method, is that, we can’t tell if a connection is between two classes or two objects and methods. So, I think we should create one more connection, so we would have two types of relationship. One connection is only representing the relations between classes, this means that, when we call an object in a class, then we can see which two classes is involved in this calling method. As you can suspect, the other type of connection is the relation between two objects. This is important, because, we use a lot of virtual methods and objects in almost every classes, and the physical form of these methods and objects, often not created in the same class where they have been created as a virtual method or object.

We need to talk about a third type of connection which can be created between objects and classes, this is the “connection, that points to the nothing”. This is an unsolved problem yet from the analysis prospective. Luckily SAMU is able to handle this problem. We need one special connection endpoint for both relationship types, that will be the “temporary connection endpoint”. This will be useful, when we are analyzing our code and we find a connection or a call without knowing its connection endpoint, we can simply connect our newly found object or class with this “temporary connection endpoint”.

Thanks to these two-connection type, we are able to tell in which class the object or method is created and where they came from. Furthermore, we can see their ancestor object as well.

At the exact moment we have only four data types at total. One for the objects and methods, one for the classes, and two for the types of relationships. We are able to define many attributes in every datatype what can make our life easier later, when we would introduce more functions on the top of information we have.

At this point we have a nearly perfect database, to make thing easier, I think one more attribute is necessary, for objects and methods too. This attribute can say if an object or a method has a physical form in the project. So, we are able to handle the physical and virtual methods separately!

Thanks to Doxygen and SAMU we were able, to properly analyze a program code, not depending on the programming language. I would like to introduce you, our result with these tools. In the green boxes you can see the classes, in it the object and methods. Some has got a flywheel above; this means it is a physical object or method. With the blue dotted line, you can see the objects and methods connections, and the black line represent the relationship between the classes. The yellow box is a class without an object or method, actually this is an interface.

With this method, now you are able to analyze your own programs independently of the programming language. You only need to download Doxygen and run the program with the appropriate setup, according to the programming language. Then you are able analyze the XML files and search for the keyword which indicates when is a method or an object being called. If you find it, you are almost done, and you can explore the remaining relationship in the program. I hope this process helps you to analyze some of your greatest program codes.

Log in to comment
© 2017 Architect Archers