Fix a gap in GitHub API v4

In this blog post, I want to show you how I solved the problem of getting commit files content with GitHub GraphQL API (v4) and to avoid using the GitHub REST API (v3) with its crazy amount of API calls. The solution is available in this GitHub Repository.

If you want the solution straight away , click here to skip the explanation of the problem.

GitHub Octocat (source: https://github.com/logos)

GitHub is the most famous code-hosting platform for version control and collaboration. It is essential for all IT people. One of the best features of GitHub is the offered APIs to manipulate the content and integrate GitHub in a workflow such as CI pipeline. It offers two versions of API: GitHub API v3, which is a REST API and GitHub GraphQL API v4. These are the current two stables API versions. (If you are wondering about the previous versions, you can get some information in GitHub documentation.

The GitHub REST API v3 covers “all” (or almost, I am not sure) areas. However, tons of API calls need to be made in order to satisfy a workflow job. Therefore, GitHub decided to replace REST API with GraphQL in the version 4 and this is why:

GitHub chose GraphQL for our API v4 because it offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints. GraphQL lets you replace multiple REST requests with a single call to fetch the data you specify.

GitHub documentation, https://developer.github.com/v4

However, the new API (GraphQL API v4) did not solve the problem 100%. It still has some gaps and it does not cover all areas. You may need to go back to the version 3 to satisfy a given job.

Disclaimer: I was novice to GitHub API and GraphQL when I started working in this project.

I was working in a project that integrates GitHub in one of its processes. I chose the GraphQL API v4. Everything was going smooth until I had to get the content of the last commit.  Google helped a bit sometimes and github.community forums too (not well referenced in Google). I looked for a way to get the content of a given commit with GraphQL, I thought that it was an obvious thing; I managed to create this graph with the provided documentation.

If you are novice to GraphQL, you can start your learning journey here graphql.org/learn and I recommend the query editor GraphiQL : the GitHub repository github.com/skevy/graphiql-app / download page electronjs.org/apps/graphiql

{
   rateLimit{
    cost
    remaining
  }
  repository(name: "GitHubAPIDemo", owner: "MohamedSahbi") {
    ref(qualifiedName: "master") {
      name
      id
      target {
        ... on Commit {
          id
          history(first: 1) {
            pageInfo {
              hasNextPage
            }
            totalCount
            edges {
              node {
                author {
                  name
                  date
                }
                changedFiles
                commitResourcePath
                oid
                abbreviatedOid
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

fragment GetAllFiles on Tree {
  ... on Tree {
    entries {
      name
      type
      oid
      object {
        ... on Tree {
          entries {
            name
            type
            oid
            object {
              ... on Blob {
                text
              }
            }
          }
        }
      }
    }
  }
}

This graph query returns by order:

  • The cost of my request in the rate limit section. This is important because every user has a limited credit of 5000 request per hour, but a single GraphQL call can cost 1 credit, 100 credits or more than 5000 credits. For more details, see the explanation provided by GitHub.
  • The most recent commit details since we choose “history (first:1)”
  • The repository content that we chose to get in the section:
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }

What I found it that the commit history does not include the changed files URLs that are provided using the REST API v3. I kept looking for a way to fix my query. I was still believing that it is provided out-of-the-box, but since I am not experienced with GraphQL, I thought I made a mistake in my query.

Then, I lost hope for a while. I decided to query the GitHub repository with REST API v3. To go the content of each file I have to:

"files": [
    {
      "sha": "9907549076a9271ee4948e909eb0669d3ba4875b",
      "filename": "LICENSE",
      "status": "added",
      "additions": 21,
      "deletions": 0,
      "changes": 21,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/LICENSE?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,21 @@\n+MIT License\n+\n+Copyright (c) 2019 Mohamed Sahbi\n+\n+Permission is hereby granted, free of charge, to any person obtaining a copy\n+of...."
    },
    {
      "sha": "7b9e8fe3adf9f784749834da35fecda8a5392bd3",
      "filename": "README.md",
      "status": "added",
      "additions": 2,
      "deletions": 0,
      "changes": 2,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/README.md?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,2 @@\n+# GitHubAPIDemo\n+This repository contains a sample demo for my blog post"
    }
  ]
  • Third, we loop over the files array and each time we have to:
    • Call the endpoint that is giving in the attribute contents_url
    • The previous call returns a JSON object that contains the attribute download_url, which is the last API endpoint that we have to call to get the file content (Finally!!!).

Imagine you want to get the updated files in the last commit without knowing the commit hash, that is:

2 API calls + (2 * number of updated files) API calls >= 4 API calls

The duplicate data, the useless extra information in the JSON responses, and the huge number of endpoint calls to get files content pushed me to keep looking further for a better solution. I kept looking until I found this post about the same problem that I am facing. It convinced me that I have to fix it myself since there is no out-of-the-box solution.

Suddenly, I found this great post about the GraphQL aliases. I should have paid more attention when learning GraphQL or spend some extra hours learning. And yeah I do not how I end up finding about aliases when I was solving the problem.

I guess you know where I am going here. Alias is the best way to avoid the 2*number of updated files API calls that I had to do using the REST API v3.  The solution is to profit from the best features of both GitHub API v3 and v4.

Solution

The final process looks like this:

  1. Get the commits using REST API v3
  2. Get the commit content also using REST API v3
  3. Generate a single GraphQL query with aliases to get the files content.

I created a sample project in GitHub with a GitHub Service that you can reuse easily. The are 2 main methods offered by this service:

  • GetLastCommitFilesContent(string directory, DateTime? startingFrom = null) : it gets the last commit, look for its content and then generate a graphQL query and retrieve the files by calling GraphQL API. It returns an object of type GitHubServiceOutput.
  • GetCommitFilesContent(string commitHash, string directory) : a similar method to the previous one, but it takes commit hash as parameter. The output is like the previous method’s output.

You will find the details about the code sample and the used libraries in the GitHub repository.

Custom Object Comparison in C#

A while ago, I needed to compare complex objects in C#. I forget to mention the real reason when I wrote the article and thanks to the feedback that I got, here is the main reason: Not only a Boolean result is needed from the comparison but also I need to have as output the properties with different values and the possibility to exclude properties from the comparison, . I looked for such function that would provide the same functionality of string.Compare() but for complex objects of the same type. Object.Equals() Method alone did not satisfy my need and it needed an implementation and overriding of the method which was not convenient. My next stop was stackoverflow and I found quite interesting approach in this discussion. That did not satisfy my need to the fullest so I took it as a start point and I had to implement my own comparison class (the implementation is available on GitHub with a running sample).

I created a static class Comparer with a constraint on the generic parameter to a class which satisfies my need. If you do not know what do constraints on type parameters mean, go to Microsoft docs (I recommend reading it and understanding what it is and why it is used because you will need it for sure). Then, reflection was the choice to get things done. PropertyInfo class which belongs to System.Reflection namespace was enough to do the work. This class allows us to get the different attributes of a property of an object and its metadata which I use to compare the properties of the 2 objects.

The created class offers different methods that you may find helpful :

GenerateAuditLog() method literally generates log. It returns an object of type ComparisonResult which can be inserted into logHistory table in your database. This method is overloaded so that you can exclude some properties from the comparison.

GenerateAuditLogMessages() method returns a list of messages that contains only changes. There is no overload for this method.

HasChanged() method simply returns Boolean result. You can eventually exclude some properties from the comparison. I found this method useful for updating records in the database.

That is all!! I hope you find it useful. Feel free to use the code or improve it.

Note: The code is not optimized (no DRY approach) because I take in consideration people who wants to use one method so they can copy the code simply (I personally recopy my code, improve it and adopt it to the case that I have).

Raw queries with Entity Framework Core

tl;dr

In this blog post, I showcase how to migrate raw SQL query from Entity Framework 6 to EF Core 2.1 and EF Core 3.1. You can find the whole sample in GitHub.

— — — —

I have been working lately on project migration form ASP.NET MVC 5 that is using Entity Framework 6 to ASP.NET Core 2.1 with Entity Framework Core 2.1. During the work, I found a raw query implemented in Entity Framework 6 as following (not really): 

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var studentIdParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Database
        .SqlQuery<int>(query, studentIdParam).ToListAsync();

    if (gradeList.Count == 0)
    {
        return 0;
    }

    return gradeList.Average();
}

Meanwhile, it is not possible to do so in Entity Framework core. I have to look for solutions and I found 2 of them:

The first solution is a simple implementation with ADO.NET, you can find it in my github account: method GetScoreAdoNet(int studentId). However, I try to avoid ADO.NET because of internal rules in our team and mainly for maintenance reasons.

So, I kept looking for another solution using Entity Framework Core. Thanks to the great community in stackoverflow, I found this answer for my problem. Here is the second and better solution : 

Solution for EF Core 2.1

I will be using Query types proposed by Entity Framework Core. 

First, we have to create a data model that will be used as return type of the executed SQL query. Although in my sample (here), I just return a number (int), I have to create a model that has one property. The name of the property should be the same name of the column selected in the SELECT Statement.

 public class AverageGrade
    {
        public int Grade { get; set; }
    }

Then, we need to configure it in the dbcontext in the method OnModelCreating 

modelBuilder.Query<AverageGrade>();

And finally we can call the raw SQL query:

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var idParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Query<AverageGrade>()
        .FromSql(query, idParam).ToListAsync();

    return gradeList.Select(x => x.Grade).ToList().Average();
}

Solution for EF Core 3.1

Starting from EF Core 3.0, the proposed solution for EF Core 2.1 is obsolete. It is part of many changes in EF Core 3.0 that you can find here.

A data model is needed for the output of the executed SQL query, same like in EF Core 2.1 solution.

public class AverageGrade
    {
        public int Grade { get; set; }
    }

The next step is adding the data model to the ModelBuilder in the method OnModelCreating :

modelBuilder.Entity<AverageGrade>().HasNoKey();

The last step is to use DbContext.Set<>() instead of DbContext.Query<>() in the method GetScore(int studentId). In other words, the line number 12 (in the last code block in the Solution for EF Core 2.1) is replaced with this line of code:

var gradeList = await _universityContext.Set<AverageGrade>().FromSqlRaw(query, idParam).ToListAsync();

— — — —

That’s it, migration was done successfully.

The Query Types have some limitations. In case you will be using them, please read the official documentation.

Finally, I prepared 3 projects, one with Entity Framework 6, the second with Entity Framework Core 2.1 (the first 2 projects use ASP.NET Core 2.1), and the third one with EF Core3.1. You can find the code source in GitHub.