Exploring how to write a code analyzer with Roslyn

Posted by Filip Ekberg on October 23 2011 3 Comments

In the previous post we looked at the documentation that came with Roslyn and how to create your first code analyzer. Now let’s take this a step further and start refactoring the code and look for more errors. Start off by create a new solution, don’t worry we’re going to re-use bits of the code from the previous post, but in a refactored manner!

I’ll call my project FECodeAnalyzer

Then I am going to rename the CodeIssueProvider to LocalDeclarationInspection and remove the body of GetIssues and replace it with return null;

Now we are ready to start thinking about how the analyzer should work, I want my GetIssues method to ask members if certain critera’s are followed or not. In this post I’ll look at the two following:

  • Can the variable be a constant value instead?
  • Is the variable used somewhere in the context?

The first one is what we looked at in the previous post, but I want to do some changes and break it out from the GetIssues method. First of all, there’s one thing that I mentioned in the previous post and this is how we determine that the Node actually is a LocalDeclarationStatementSyntax . I added an if statement to check if the type was what I wanted but it is a much faster way to add it to the class attribute ExportSyntaxNodeCodeIssueProvider instead like this:

[ExportSyntaxNodeCodeIssueProvider("FECodeAnalyzer",
                                    LanguageNames.CSharp,
                                    typeof(LocalDeclarationStatementSyntax))]

Since I want to delegate the work from GetIssues and check if my current node follows up on all the critera’s that I require, I want to do some initial initialization of variables that I will pass to the methods. So this is what I want to limit to GetIssues:

  • Have a List<CodeIssue> that we want to fill with errors
  • Load the semantic model so we don’t need to pass the document around
  • Load the containing block and return null if there is none
  • Measure the analysis bounds
  • Create a data flow analysis, we don’t want to do this over and over again.

So this will look somewhat like this:

List<CodeIssue> issues = new List<CodeIssue>();
var localDeclaration = (LocalDeclarationStatementSyntax)node;
var semanticModel = document.GetSemanticModel();
var containingBlock = localDeclaration.FirstAncestorOrSelf<BlockSyntax>();

if (containingBlock == null) return issues;

var analysisBounds = TextSpan.FromBounds(
                            start: localDeclaration.Span.End,
                            end: containingBlock.Span.End);
var dataFlowAnalysis = semanticModel.AnalyzeRegionDataFlow(analysisBounds);

A side note here is that Roslyn don’t support yield returns yet, so that is why I declare and initialize a List of issues to return. Now the next thing I want to create here is a method to check if the variable can be made constant or not, this is my method signature for it:

private bool CanBeConst(LocalDeclarationStatementSyntax localDeclaration,
                        ISemanticModel semanticModel,
                        IRegionDataFlowAnalysis dataFlowAnalysis)

I can almost take the code from the previous sample with a small amount of modifications, instead of returning null, I am returning false and the last return is a true instead of a code issue object. Like this:

private bool CanBeConst(LocalDeclarationStatementSyntax localDeclaration,
                        ISemanticModel semanticModel,
                        IRegionDataFlowAnalysis dataFlowAnalysis)
{
    if (localDeclaration.Modifiers.Any(SyntaxKind.ConstKeyword)) return false;

    if (localDeclaration.Declaration.Variables.Any(v => v.InitializerOpt == null)) return false;

    if (localDeclaration.Declaration.Variables
                        .Select(v => semanticModel.GetDeclaredSymbol(v))
                        .Any(s => dataFlowAnalysis.WrittenInside.Contains(s)))
    {
        return false;
    }
    if (localDeclaration.Declaration.Variables
                        .Select(v => v.InitializerOpt.Value)
                        .Select(i => semanticModel.GetSemanticInfo(i))
                        .Any(info => !info.IsCompileTimeConstant))
    {
        return false;
    }

    return true;
}

This means that I can write something like this is GetIssues:

if (CanBeConst(localDeclaration, semanticModel, dataFlowAnalysis))
{
    issues.Add(new CodeIssue(CodeIssue.Severity.Warning,
                            localDeclaration.Span,
                            string.Format("{0} can be made constant",
                            localDeclaration.Declaration.Variables.First().Identifier)));
}

This is pretty similar to the previous post, except that we are not clotting GetIssues with a lot of code. Also I can now add more issues to the iterator! So next up I want to check if the variable is used somewhere in the context, the context being the analysis bounds.

Consider this code where x is being unused:

int x = 10;
x = 20;

As long as x is not used anywhere(read) it will be considered unused! So we can ask our data flow analysis for all reads that are inside the analysis bounds and we want to search for the local declarations variable, this can be done like this:

dataFlowAnalysis.ReadInside.Contains(
    semanticModel.GetDeclaredSymbol(localDeclaration.Declaration.Variables.First()))

So the method that I want to have for checking for unused variables can look somewhat like this:

public bool IsNeverUsed(LocalDeclarationStatementSyntax localDeclaration,
                        ISemanticModel semanticModel,
                        IRegionDataFlowAnalysis dataFlowAnalysis)
{
    if (dataFlowAnalysis.ReadInside.Contains(
            semanticModel.GetDeclaredSymbol(localDeclaration.Declaration.Variables.First()))
        )
        return false;

    return true;
}

Which means that in my GetIssues method I can add the following to check if the node is unused and then add an error message:

if (IsNeverUsed(localDeclaration, semanticModel, dataFlowAnalysis))
{
    issues.Add(new CodeIssue(CodeIssue.Severity.Warning,
                                localDeclaration.Span,
                                string.Format("Variable {0} is declared but never used",
                                localDeclaration.Declaration.Variables.First().Identifier)));
}

As you can see this is quite modular at the moment, we can add more of these checks to correspond with our rules! The last thing we do in the GetIssues method is to return the issue list. So all the methods will end up looking like this:

private bool CanBeConst(LocalDeclarationStatementSyntax localDeclaration,
                        ISemanticModel semanticModel,
                        IRegionDataFlowAnalysis dataFlowAnalysis)
{
    if (localDeclaration.Modifiers.Any(SyntaxKind.ConstKeyword)) return false;

    if (localDeclaration.Declaration.Variables.Any(v => v.InitializerOpt == null)) return false;

    if (localDeclaration.Declaration.Variables
                        .Select(v => semanticModel.GetDeclaredSymbol(v))
                        .Any(s => dataFlowAnalysis.WrittenInside.Contains(s)))
    {
        return false;
    }
    if (localDeclaration.Declaration.Variables
                        .Select(v => v.InitializerOpt.Value)
                        .Select(i => semanticModel.GetSemanticInfo(i))
                        .Any(info => !info.IsCompileTimeConstant))
    {
        return false;
    }

    return true;
}
public bool IsNeverUsed(LocalDeclarationStatementSyntax localDeclaration,
                        ISemanticModel semanticModel,
                        IRegionDataFlowAnalysis dataFlowAnalysis)
{
    if (dataFlowAnalysis.ReadInside.Contains(
            semanticModel.GetDeclaredSymbol(localDeclaration.Declaration.Variables.First()))
        )
        return false;

    return true;
}
public IEnumerable<CodeIssue> GetIssues(IDocument document,
                                        CommonSyntaxNode node,
                                        CancellationToken cancellationToken)
{
    List<CodeIssue> issues = new List<CodeIssue>();
    var localDeclaration = (LocalDeclarationStatementSyntax)node;
    var semanticModel = document.GetSemanticModel();
    var containingBlock = localDeclaration.FirstAncestorOrSelf<BlockSyntax>();

    if (containingBlock == null) return issues;

    var analysisBounds = TextSpan.FromBounds(
                                start: localDeclaration.Span.End,
                                end: containingBlock.Span.End);
    var dataFlowAnalysis = semanticModel.AnalyzeRegionDataFlow(analysisBounds);

    if (CanBeConst(localDeclaration, semanticModel, dataFlowAnalysis))
    {
        issues.Add(new CodeIssue(CodeIssue.Severity.Warning,
                                localDeclaration.Span,
                                string.Format("{0} can be made constant",
                                localDeclaration.Declaration.Variables.First().Identifier)));
    }

    if (IsNeverUsed(localDeclaration, semanticModel, dataFlowAnalysis))
    {
        issues.Add(new CodeIssue(CodeIssue.Severity.Warning,
                                    localDeclaration.Span,
                                    string.Format("Variable {0} is declared but never used",
                                    localDeclaration.Declaration.Variables.First().Identifier)));
    }

    return issues;
}

And when debugging this ( by pressing F5 ), we can see the errors like this in a console application when the variable is both unused and can be made constant:

And look like this when the variable is not possible to make constant but is unused:

I hope you found this interesting, if you have any thoughts please leave a comment below!

Vote on HN

Creating a basic code analysis with Roslyn

Posted by Filip Ekberg on October 23 2011 Leave a Comment

If you’ve installed the Roslyn CTP, you can go to the installation folder and look inside the Documentation folder, there’s a lot of interesting information here that you can make use of. I’ve got my documentation here:

C:\Program Files (x86)\Microsoft Codename Roslyn CTP\Documentation

Now there’s one document here that is a bit extra interesting, at least for me, it talks about how we can make basic code analysis with Roslyn ( How to Write a Quick Fix (CSharp).docx ). The basic idea is to identify whenever a variable can be made const. So for those of you that haven’t had the time to download and install Roslyn yet, I’ll show you how to do exactly that with the help of their sample. It’s essentially the same outcome and code as they use in their documentation, but I will try explain a little bit more about each piece and add some extra things as well. But be sure to check out the documentation that comes with Roslyn as well!

However, the sample in the document has an error to it so it doesn’t run out of the box!

First thing is to open up an instance of Visual Studio and create a new Code Issue project, I’ll call it MyFirstCodeIssueFix

This project comes with some code already so that you can get started, but we’re going to start looking at this from the beginning so let’s remove everything in

public IEnumerable<CodeIssue> GetIssues(IDocument document,
                                        CommonSyntaxNode node,
                                        CancellationToken cancellationToken)

Don’t confuse it to the other override where the second argument is a CommonSyntaxToken and not CommonSyntaxNode!

The first thing that we have to do is that we have to check if the node is what we expect it to be:

if (node.GetType() != typeof(LocalDeclarationStatementSyntax)) return null;

There is another way to do this though which is what they use in the documentation, in their example they restrict the entire class to only work with LocalDeclarationStatementSyntax like this:

[ExportSyntaxNodeCodeIssueProvider("MyFirstCodeIssueFix",
                                    LanguageNames.CSharp,
                                    typeof(LocalDeclarationStatementSyntax))]

Using this will make it a bit faster, but for clearity I will not use it now. However when you have a lot of analyses going on you might want to break it all out and have one statement syntax per file. For instance, don’t analyze using blocks and variables in the same file.

The reason we check for LocalDeclarationStatementSyntax is because, we want to see if it is a local variable. This method will be invoked for each different node in the source that we are analyzing, so we will se UsingDirectiveSyntaxamong a lot of others.

The next two things that we are going to do is to cast the node parameter to its correct type and then check if it is already a constant type, if it is we don’t need to do anything at all with it

var localDeclaration = (LocalDeclarationStatementSyntax)node;
if (localDeclaration.Modifiers.Any(SyntaxKind.ConstKeyword)) return null;

Next up we will check if we can actually retrieve the code block surrounding the variable, so that we can actually analyze this block later on. Then we check if the variable actually has an initializer

var containingBlock = localDeclaration.FirstAncestorOrSelf<BlockSyntax>();
if (containingBlock == null) return null;

if (localDeclaration.Declaration.Variables.Any(v => v.InitializerOpt == null)) return null;

Now we want to get the semantic model and this is fetched from the document argument that is passed to the method:

var semanticModel = document.GetSemanticModel();

When we have the semantic model, we can get a little bit of information from it, in this case we want to see if the variable is initialized with a constant expression. This is done by selecting the actual value of the variable and see if the value is a compile time constant

if (localDeclaration.Declaration.Variables
                    .Select(v => v.InitializerOpt.Value)
                    .Select(i => semanticModel.GetSemanticInfo(i))
                    .Any(info => !info.IsCompileTimeConstant))
{
    return null;
}

The next thing which is almost the last thing, is that we want to check if the variable is set to another value later down in the code, so to do this we need to analyze the code block after the current variable to see if it occurs more than once.

So we define the bounds for where we want to analyze:

var analysisBounds = TextSpan.FromBounds(
    start: localDeclaration.Span.End,
    end: containingBlock.Span.End);

Note that if you were to set localDeclaration.Span.Start instead, we would include the current variable in the check and thus always have a true statement for our next test! So now we can create a data flow analyzer for this and search through it for any new occurrences of the variable like this:

var dataFlowAnalysis = semanticModel.AnalyzeRegionDataFlow(analysisBounds);

if (localDeclaration.Declaration.Variables
                    .Select(v => semanticModel.GetDeclaredSymbol(v))
                    .Any(s => dataFlowAnalysis.WrittenInside.Contains(s)))
{
    return null;
}

So by now we’ve completed the check and if we’ve come this far, there is an error, the variable can be made constant, so what we do now is we return an error saying what is wrong

    return new[]
    {
        new CodeIssue(CodeIssue.Severity.Warning, localDeclaration.Span,
            string.Format("{0} can be made constant",
                          localDeclaration.Declaration.Variables.First().Identifier))
    };;

How do we test this bad boy?, if you press F5 you’ll get a new instance of Visual Studio 2010, this is exactly what we want. Now create a console application in this new instance and write the following in the main method:

int x = 10;

And this is what you should see:

Here’s the entire GetIssues method:

public IEnumerable<CodeIssue> GetIssues(IDocument document,
                                        CommonSyntaxNode node,
                                        CancellationToken cancellationToken)
{
    if (node.GetType() != typeof(LocalDeclarationStatementSyntax)) return null;

    var localDeclaration = (LocalDeclarationStatementSyntax)node;
    if (localDeclaration.Modifiers.Any(SyntaxKind.ConstKeyword)) return null;

    var containingBlock = localDeclaration.FirstAncestorOrSelf<BlockSyntax>();
    if (containingBlock == null) return null;

    if (localDeclaration.Declaration.Variables.Any(v => v.InitializerOpt == null)) return null;

    var semanticModel = document.GetSemanticModel();

    if (localDeclaration.Declaration.Variables
                        .Select(v => v.InitializerOpt.Value)
                        .Select(i => semanticModel.GetSemanticInfo(i))
                        .Any(info => !info.IsCompileTimeConstant))
    {
        return null;
    }

    var analysisBounds = TextSpan.FromBounds(
        start: localDeclaration.Span.End,
        end: containingBlock.Span.End);

    var dataFlowAnalysis = semanticModel.AnalyzeRegionDataFlow(analysisBounds);

    if (localDeclaration.Declaration.Variables
                        .Select(v => semanticModel.GetDeclaredSymbol(v))
                        .Any(s => dataFlowAnalysis.WrittenInside.Contains(s)))
    {
        return null;
    }

    return new[]
    {
        new CodeIssue(CodeIssue.Severity.Warning, localDeclaration.Span,
            string.Format("{0} can be made constant",
                          localDeclaration.Declaration.Variables.First().Identifier))
    };
}

I hope you found this interesting, if you have any thoughts please leave a comment below!

Vote on HN

Getting all methods from a code file with Roslyn

Posted by Filip Ekberg on October 21 2011 Leave a Comment

In the previous post we started looking at Roslyn and let’s continue on this topic and see what else we can get out of it! I want to take a look at how we can retrieve all methods and get some information about them. I’ve added another method to the Person-class so it looks like this now:

public class Person
{
    public string Name { get; private set; }
    public Person(string name)
    {
        Name = name;
    }
    public void Evaporate()
    {
           
    }
    public string Speak()
    {
        string str = "test";
        return string.Format("Hello! My name is{0}",
            Name);
    }
}

We’ve already got the tree-structure and the root node so let’s just use that. Everything is represented as a SyntaxNode so we need to get all the descending nodes that are methods, methods are declared as MethodDeclarationSyntax. So all methods are retrieved like this:

IEnumerable<MethodDeclarationSyntax> methods = tree.Root
                .DescendentNodes()
                .OfType<MethodDeclarationSyntax>().ToList();

No we can just iterate over this:

foreach(var method in methods)
{
}

However, you might be a bit confused as to how you print the method name, because there’s not a Name-property on the object! Instead there is something called an Identifier that we can use:

foreach(var method in methods)
{
    Console.WriteLine(method.Identifier);
}

This will print all methods and not including the constructors, if we want to get the constructors we ask for ConstructorDeclarationSyntax instead of MethodDeclarationSyntax. We can get a lot of interesting things from the method-object in the iterator, we can ask about the parameters, the return type and a lot of other nice things.

I hope you found this interesting, if you have any thoughts please leave a comment below!

Vote on HN

Using Roslyn to parse C# code files

Posted by Filip Ekberg on October 20 2011 Leave a Comment

A couple of days ago Microsoft released something called the Roslyn Project and it is now in it’s CTP state, just as Async! But what is Roslyn and what can it be used for? In the previous post I talked about how I wrote assembler that was generated from an application that parsed some programming language, but what I didn’t do was the actual parsing of code. Actually parsing code is not only relevant when you want to write a compiler, it is also useful when you want to evaluate how good a certain chunk of code is.

There are a lot of really good software out there that will help you analyze your code, some of them analyze the code after it has been compile such as a software called NDepend, which is a really good tool. Another program that is commonly used is ReSharper, from what I know, ReSharper analyzes the code structure without actually compiling it all the way down to IL.

You can call this parsing+evaluating, when you add the extra step that actually generates new code I would call it a compiler. However, let’s get back to Roslyn. So the project places themselves on the market saying that before roslyn the C# and VB.NET compiler were just a black box with no integration capabilities, what roslyn does is that it opens up the black box and allowing an interface between your code and the compiler.

What this means is that you can parse a code file that haven’t been compiled yet and get a nice structure out of it that you can do whatever you like with. To get started with Roslyn, this is what you need to do:

When all this is installed, open up Visual Studio and create a new Roslyn C# Console Application

Now create a new folder called ToParse and add a class to it with some fields and methods

So now we have this Person class that we want to parse, here’s to code so you can just copy/paste it:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace HelloWorldAnalyzer.ToParse
{
    public class Person
    {
        public string Name { get; private set; }
        public Person(string name)
        {
            Name = name;
        }
        public string Speak()
        {
            return string.Format("Hello! My name is{0}",
                Name);
        }
    }
}

Now go back to the main method and let’s get started with the parsin!

The first thing that I want to do is to just read the text inside the file, for the purpose of this example I’ll do it like this:

var code = new StreamReader("..\\..\\ToParse\\Person.cs").ReadToEnd();

We go back two folders ( ..\..\ ) because the application will run from the bin\Debug\ folder!

Next up we want to get something that is called a SyntaxTree from our code, here’s a good illustration of what a SyntaxTree is:

So we get this tree by doing this:

SyntaxTree tree = SyntaxTree.ParseCompilationUnit(code);

Now we want to retrieve the root of the tree:

var root = (CompilationUnitSyntax)tree.Root;

Next up I want to print all the using-blocks in the class file that we just parsed on the CompilationUnitSyntax instance we have a property called Usings, we can use this to get a list of UsingDirectiveSyntax

foreach(var usingBlock in root.Usings)
{
    Console.WriteLine("Using block: {0}", usingBlock.Name);
}

This will print all the using blocks and will result in something like this:

Using block: System
Using block: System.Collections.Generic
Using block: System.Linq
Using block: System.Text

Now let’s do something a bit more fun, let’s retrieve all the nodes in the syntax tree and look for a LiteralExpressionSyntax, which actually will be the string inside the Speak() method!

We do this by first getting all the descendent nodes from the root and just get the first literal expression syntax that we find:

var personSpoke =   root.DescendentNodes()
                        .OfType<LiteralExpressionSyntax>()
                        .FirstOrDefault();

If we write this to the console as well we should see the following:

This is just the bare surface of what you can do with Roslyn, there are some Very interesting resources to look through. Here’s an MSDN page with a lot of documents on how you do certain things in Roslyn. Be sure to check that out! So far we’ve just done the parsing step, but you can also do compilation with it since it exposes all the different steps of the C# and VB.NET compiler.

I hope you found this interesting because I had a lot of fun writing it and if you have any thoughts please leave a comment below!

Vote on HN