LUIS Migration – False Error Message: Module {..} already exists

Here I am back with another story from my daily work problems. We have been using Language Understanding (LUIS) for one of our solutions for a while. Recently, Microsoft made some upgrades and created a new platform that benefits from resource authoring based on Role-Based Access Control (RBAC), Azure Active Directory (Azure AD) and custom domains. So, we had to migrate our LUIS app. And that did not go well.

For those who are not familiar with LUIS, Azure offers Azure cognitive services offer a set of REST APIs that helps you build intelligent applications without the need to develop your own models with Machine Learning or Deep Network, etc. One of these services is Language Understanding (LUIS). It is a cloud-based API service that applies custom machine-learning intelligence to natural language text to predict overall meaning, and pull out relevant, detailed information. For more details, check the official site.

The Luis portal is being changed because it cannot keep up with Azure services. So, Microsoft decided to connect the new upgrades with creation of a new portal. The bad part is that every user, who has been using the old portal, has to migrate to the new one by himself/herself, so that it decommissions the old portal by June 2020. See the unofficial announcement here https://github.com/azure-deprecation/dashboard/issues/26.

To help with the Migration Microsoft offered some documentation such as this one https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-migration-authoring.

We decided to export the application from the old portal and add it to the new portal. In other words, we are not following Microsoft’s migration workflow because it is not the smartest for a professional environment. We chose the best approach that is possible to keep the solution in production running with the LUIS app in the old portal. Meanwhile, we create another one in the new portal and hoping that everything goes smooth.

(For the demo, I used Travel Agent Sample from the samples provided by Microsoft)

So, we started the migration process and it did not go as planned (like always). When importing the application, we had this error message:

BadArgument: The models: { BookFlight } already exist in the specified application version.

Figure 1 : Import Application

The error message does not say a lot. “The models” literally does not make any sense in this context or in the context of LUIS. BookFlight is the name of an intent and an entity in the imported application. That is what caused the error. The new platform does not accept same name for an intent and an entity, so we had to rename it.

A good error message can be something like this:

You cannot assign the same name for an entity and an intent. Please use a unique name for the intent {BookFlight}

(I have no idea why we have it like that, the LUIS application is there for more than 2 years, literally before I joined my team, but this is not an excuse.)

Such a small issue took us some hours and an effort of 3 persons with troubleshooting it.

There are 2 solutions for this problem:

  1. Make sure that each intent has a unique name. The name must not be used in entities or any other part of the application. (This is the best solution)
  2. Rename the entity, but also the pattern if you are using one

To retrace the error, upload the application that is provided in this GitHub repository. An other option to retrace the error is by adding an entity with the name of an existing intent or the other way around.

Figure 2: Create new entity

— — — —

To recap, we saw how a wrong error message looks like and how to fix the error (in case you had the same error). Check out also my blog post Fix a gap in GitHub API v4. Well if you read to the end, thank you very much for your support. Wishing you a Happy New Year!

Fix a gap in GitHub API v4

In this blog post, I want to show you how I solved the problem of getting commit files content with GitHub GraphQL API (v4) and to avoid using the GitHub REST API (v3) with its crazy amount of API calls. The solution is available in this GitHub Repository.

If you want the solution straight away , click here to skip the explanation of the problem.

GitHub Octocat (source: https://github.com/logos)

GitHub is the most famous code-hosting platform for version control and collaboration. It is essential for all IT people. One of the best features of GitHub is the offered APIs to manipulate the content and integrate GitHub in a workflow such as CI pipeline. It offers two versions of API: GitHub API v3, which is a REST API and GitHub GraphQL API v4. These are the current two stables API versions. (If you are wondering about the previous versions, you can get some information in GitHub documentation.

The GitHub REST API v3 covers “all” (or almost, I am not sure) areas. However, tons of API calls need to be made in order to satisfy a workflow job. Therefore, GitHub decided to replace REST API with GraphQL in the version 4 and this is why:

GitHub chose GraphQL for our API v4 because it offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints. GraphQL lets you replace multiple REST requests with a single call to fetch the data you specify.

GitHub documentation, https://developer.github.com/v4

However, the new API (GraphQL API v4) did not solve the problem 100%. It still has some gaps and it does not cover all areas. You may need to go back to the version 3 to satisfy a given job.

Disclaimer: I was novice to GitHub API and GraphQL when I started working in this project.

I was working in a project that integrates GitHub in one of its processes. I chose the GraphQL API v4. Everything was going smooth until I had to get the content of the last commit.  Google helped a bit sometimes and github.community forums too (not well referenced in Google). I looked for a way to get the content of a given commit with GraphQL, I thought that it was an obvious thing; I managed to create this graph with the provided documentation.

If you are novice to GraphQL, you can start your learning journey here graphql.org/learn and I recommend the query editor GraphiQL : the GitHub repository github.com/skevy/graphiql-app / download page electronjs.org/apps/graphiql

{
   rateLimit{
    cost
    remaining
  }
  repository(name: "GitHubAPIDemo", owner: "MohamedSahbi") {
    ref(qualifiedName: "master") {
      name
      id
      target {
        ... on Commit {
          id
          history(first: 1) {
            pageInfo {
              hasNextPage
            }
            totalCount
            edges {
              node {
                author {
                  name
                  date
                }
                changedFiles
                commitResourcePath
                oid
                abbreviatedOid
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

fragment GetAllFiles on Tree {
  ... on Tree {
    entries {
      name
      type
      oid
      object {
        ... on Tree {
          entries {
            name
            type
            oid
            object {
              ... on Blob {
                text
              }
            }
          }
        }
      }
    }
  }
}

This graph query returns by order:

  • The cost of my request in the rate limit section. This is important because every user has a limited credit of 5000 request per hour, but a single GraphQL call can cost 1 credit, 100 credits or more than 5000 credits. For more details, see the explanation provided by GitHub.
  • The most recent commit details since we choose “history (first:1)”
  • The repository content that we chose to get in the section:
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }

What I found it that the commit history does not include the changed files URLs that are provided using the REST API v3. I kept looking for a way to fix my query. I was still believing that it is provided out-of-the-box, but since I am not experienced with GraphQL, I thought I made a mistake in my query.

Then, I lost hope for a while. I decided to query the GitHub repository with REST API v3. To go the content of each file I have to:

"files": [
    {
      "sha": "9907549076a9271ee4948e909eb0669d3ba4875b",
      "filename": "LICENSE",
      "status": "added",
      "additions": 21,
      "deletions": 0,
      "changes": 21,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/LICENSE?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,21 @@\n+MIT License\n+\n+Copyright (c) 2019 Mohamed Sahbi\n+\n+Permission is hereby granted, free of charge, to any person obtaining a copy\n+of...."
    },
    {
      "sha": "7b9e8fe3adf9f784749834da35fecda8a5392bd3",
      "filename": "README.md",
      "status": "added",
      "additions": 2,
      "deletions": 0,
      "changes": 2,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/README.md?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,2 @@\n+# GitHubAPIDemo\n+This repository contains a sample demo for my blog post"
    }
  ]
  • Third, we loop over the files array and each time we have to:
    • Call the endpoint that is giving in the attribute contents_url
    • The previous call returns a JSON object that contains the attribute download_url, which is the last API endpoint that we have to call to get the file content (Finally!!!).

Imagine you want to get the updated files in the last commit without knowing the commit hash, that is:

2 API calls + (2 * number of updated files) API calls >= 4 API calls

The duplicate data, the useless extra information in the JSON responses, and the huge number of endpoint calls to get files content pushed me to keep looking further for a better solution. I kept looking until I found this post about the same problem that I am facing. It convinced me that I have to fix it myself since there is no out-of-the-box solution.

Suddenly, I found this great post about the GraphQL aliases. I should have paid more attention when learning GraphQL or spend some extra hours learning. And yeah I do not how I end up finding about aliases when I was solving the problem.

I guess you know where I am going here. Alias is the best way to avoid the 2*number of updated files API calls that I had to do using the REST API v3.  The solution is to profit from the best features of both GitHub API v3 and v4.

Solution

The final process looks like this:

  1. Get the commits using REST API v3
  2. Get the commit content also using REST API v3
  3. Generate a single GraphQL query with aliases to get the files content.

I created a sample project in GitHub with a GitHub Service that you can reuse easily. The are 2 main methods offered by this service:

  • GetLastCommitFilesContent(string directory, DateTime? startingFrom = null) : it gets the last commit, look for its content and then generate a graphQL query and retrieve the files by calling GraphQL API. It returns an object of type GitHubServiceOutput.
  • GetCommitFilesContent(string commitHash, string directory) : a similar method to the previous one, but it takes commit hash as parameter. The output is like the previous method’s output.

You will find the details about the code sample and the used libraries in the GitHub repository.