LUIS Migration – False Error Message: Module {..} already exists

Here I am back with another story from my daily work problems. We have been using Language Understanding (LUIS) for one of our solutions for a while. Recently, Microsoft made some upgrades and created a new platform that benefits from resource authoring based on Role-Based Access Control (RBAC), Azure Active Directory (Azure AD) and custom domains. So, we had to migrate our LUIS app. And that did not go well.

For those who are not familiar with LUIS, Azure offers Azure cognitive services offer a set of REST APIs that helps you build intelligent applications without the need to develop your own models with Machine Learning or Deep Network, etc. One of these services is Language Understanding (LUIS). It is a cloud-based API service that applies custom machine-learning intelligence to natural language text to predict overall meaning, and pull out relevant, detailed information. For more details, check the official site.

The Luis portal is being changed because it cannot keep up with Azure services. So, Microsoft decided to connect the new upgrades with creation of a new portal. The bad part is that every user, who has been using the old portal, has to migrate to the new one by himself/herself, so that it decommissions the old portal by June 2020. See the unofficial announcement here https://github.com/azure-deprecation/dashboard/issues/26.

To help with the Migration Microsoft offered some documentation such as this one https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-migration-authoring.

We decided to export the application from the old portal and add it to the new portal. In other words, we are not following Microsoft’s migration workflow because it is not the smartest for a professional environment. We chose the best approach that is possible to keep the solution in production running with the LUIS app in the old portal. Meanwhile, we create another one in the new portal and hoping that everything goes smooth.

(For the demo, I used Travel Agent Sample from the samples provided by Microsoft)

So, we started the migration process and it did not go as planned (like always). When importing the application, we had this error message:

BadArgument: The models: { BookFlight } already exist in the specified application version.

Figure 1 : Import Application

The error message does not say a lot. “The models” literally does not make any sense in this context or in the context of LUIS. BookFlight is the name of an intent and an entity in the imported application. That is what caused the error. The new platform does not accept same name for an intent and an entity, so we had to rename it.

A good error message can be something like this:

You cannot assign the same name for an entity and an intent. Please use a unique name for the intent {BookFlight}

(I have no idea why we have it like that, the LUIS application is there for more than 2 years, literally before I joined my team, but this is not an excuse.)

Such a small issue took us some hours and an effort of 3 persons with troubleshooting it.

There are 2 solutions for this problem:

  1. Make sure that each intent has a unique name. The name must not be used in entities or any other part of the application. (This is the best solution)
  2. Rename the entity, but also the pattern if you are using one

To retrace the error, upload the application that is provided in this GitHub repository. An other option to retrace the error is by adding an entity with the name of an existing intent or the other way around.

Figure 2: Create new entity

— — — —

To recap, we saw how a wrong error message looks like and how to fix the error (in case you had the same error). Check out also my blog post Fix a gap in GitHub API v4. Well if you read to the end, thank you very much for your support. Wishing you a Happy New Year!

Fix a gap in GitHub API v4

In this blog post, I want to show you how I solved the problem of getting commit files content with GitHub GraphQL API (v4) and to avoid using the GitHub REST API (v3) with its crazy amount of API calls. The solution is available in this GitHub Repository.

If you want the solution straight away , click here to skip the explanation of the problem.

GitHub Octocat (source: https://github.com/logos)

GitHub is the most famous code-hosting platform for version control and collaboration. It is essential for all IT people. One of the best features of GitHub is the offered APIs to manipulate the content and integrate GitHub in a workflow such as CI pipeline. It offers two versions of API: GitHub API v3, which is a REST API and GitHub GraphQL API v4. These are the current two stables API versions. (If you are wondering about the previous versions, you can get some information in GitHub documentation.

The GitHub REST API v3 covers “all” (or almost, I am not sure) areas. However, tons of API calls need to be made in order to satisfy a workflow job. Therefore, GitHub decided to replace REST API with GraphQL in the version 4 and this is why:

GitHub chose GraphQL for our API v4 because it offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints. GraphQL lets you replace multiple REST requests with a single call to fetch the data you specify.

GitHub documentation, https://developer.github.com/v4

However, the new API (GraphQL API v4) did not solve the problem 100%. It still has some gaps and it does not cover all areas. You may need to go back to the version 3 to satisfy a given job.

Disclaimer: I was novice to GitHub API and GraphQL when I started working in this project.

I was working in a project that integrates GitHub in one of its processes. I chose the GraphQL API v4. Everything was going smooth until I had to get the content of the last commit.  Google helped a bit sometimes and github.community forums too (not well referenced in Google). I looked for a way to get the content of a given commit with GraphQL, I thought that it was an obvious thing; I managed to create this graph with the provided documentation.

If you are novice to GraphQL, you can start your learning journey here graphql.org/learn and I recommend the query editor GraphiQL : the GitHub repository github.com/skevy/graphiql-app / download page electronjs.org/apps/graphiql

{
   rateLimit{
    cost
    remaining
  }
  repository(name: "GitHubAPIDemo", owner: "MohamedSahbi") {
    ref(qualifiedName: "master") {
      name
      id
      target {
        ... on Commit {
          id
          history(first: 1) {
            pageInfo {
              hasNextPage
            }
            totalCount
            edges {
              node {
                author {
                  name
                  date
                }
                changedFiles
                commitResourcePath
                oid
                abbreviatedOid
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

fragment GetAllFiles on Tree {
  ... on Tree {
    entries {
      name
      type
      oid
      object {
        ... on Tree {
          entries {
            name
            type
            oid
            object {
              ... on Blob {
                text
              }
            }
          }
        }
      }
    }
  }
}

This graph query returns by order:

  • The cost of my request in the rate limit section. This is important because every user has a limited credit of 5000 request per hour, but a single GraphQL call can cost 1 credit, 100 credits or more than 5000 credits. For more details, see the explanation provided by GitHub.
  • The most recent commit details since we choose “history (first:1)”
  • The repository content that we chose to get in the section:
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }

What I found it that the commit history does not include the changed files URLs that are provided using the REST API v3. I kept looking for a way to fix my query. I was still believing that it is provided out-of-the-box, but since I am not experienced with GraphQL, I thought I made a mistake in my query.

Then, I lost hope for a while. I decided to query the GitHub repository with REST API v3. To go the content of each file I have to:

"files": [
    {
      "sha": "9907549076a9271ee4948e909eb0669d3ba4875b",
      "filename": "LICENSE",
      "status": "added",
      "additions": 21,
      "deletions": 0,
      "changes": 21,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/LICENSE?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,21 @@\n+MIT License\n+\n+Copyright (c) 2019 Mohamed Sahbi\n+\n+Permission is hereby granted, free of charge, to any person obtaining a copy\n+of...."
    },
    {
      "sha": "7b9e8fe3adf9f784749834da35fecda8a5392bd3",
      "filename": "README.md",
      "status": "added",
      "additions": 2,
      "deletions": 0,
      "changes": 2,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/README.md?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,2 @@\n+# GitHubAPIDemo\n+This repository contains a sample demo for my blog post"
    }
  ]
  • Third, we loop over the files array and each time we have to:
    • Call the endpoint that is giving in the attribute contents_url
    • The previous call returns a JSON object that contains the attribute download_url, which is the last API endpoint that we have to call to get the file content (Finally!!!).

Imagine you want to get the updated files in the last commit without knowing the commit hash, that is:

2 API calls + (2 * number of updated files) API calls >= 4 API calls

The duplicate data, the useless extra information in the JSON responses, and the huge number of endpoint calls to get files content pushed me to keep looking further for a better solution. I kept looking until I found this post about the same problem that I am facing. It convinced me that I have to fix it myself since there is no out-of-the-box solution.

Suddenly, I found this great post about the GraphQL aliases. I should have paid more attention when learning GraphQL or spend some extra hours learning. And yeah I do not how I end up finding about aliases when I was solving the problem.

I guess you know where I am going here. Alias is the best way to avoid the 2*number of updated files API calls that I had to do using the REST API v3.  The solution is to profit from the best features of both GitHub API v3 and v4.

Solution

The final process looks like this:

  1. Get the commits using REST API v3
  2. Get the commit content also using REST API v3
  3. Generate a single GraphQL query with aliases to get the files content.

I created a sample project in GitHub with a GitHub Service that you can reuse easily. The are 2 main methods offered by this service:

  • GetLastCommitFilesContent(string directory, DateTime? startingFrom = null) : it gets the last commit, look for its content and then generate a graphQL query and retrieve the files by calling GraphQL API. It returns an object of type GitHubServiceOutput.
  • GetCommitFilesContent(string commitHash, string directory) : a similar method to the previous one, but it takes commit hash as parameter. The output is like the previous method’s output.

You will find the details about the code sample and the used libraries in the GitHub repository.

Custom Object Comparison in C#

A while ago, I needed to compare complex objects in C#. I forget to mention the real reason when I wrote the article and thanks to the feedback that I got, here is the main reason: Not only a Boolean result is needed from the comparison but also I need to have as output the properties with different values and the possibility to exclude properties from the comparison, . I looked for such function that would provide the same functionality of string.Compare() but for complex objects of the same type. Object.Equals() Method alone did not satisfy my need and it needed an implementation and overriding of the method which was not convenient. My next stop was stackoverflow and I found quite interesting approach in this discussion. That did not satisfy my need to the fullest so I took it as a start point and I had to implement my own comparison class (the implementation is available on GitHub with a running sample).

I created a static class Comparer with a constraint on the generic parameter to a class which satisfies my need. If you do not know what do constraints on type parameters mean, go to Microsoft docs (I recommend reading it and understanding what it is and why it is used because you will need it for sure). Then, reflection was the choice to get things done. PropertyInfo class which belongs to System.Reflection namespace was enough to do the work. This class allows us to get the different attributes of a property of an object and its metadata which I use to compare the properties of the 2 objects.

The created class offers different methods that you may find helpful :

GenerateAuditLog() method literally generates log. It returns an object of type ComparisonResult which can be inserted into logHistory table in your database. This method is overloaded so that you can exclude some properties from the comparison.

GenerateAuditLogMessages() method returns a list of messages that contains only changes. There is no overload for this method.

HasChanged() method simply returns Boolean result. You can eventually exclude some properties from the comparison. I found this method useful for updating records in the database.

That is all!! I hope you find it useful. Feel free to use the code or improve it.

Note: The code is not optimized (no DRY approach) because I take in consideration people who wants to use one method so they can copy the code simply (I personally recopy my code, improve it and adopt it to the case that I have).

Raw queries with Entity Framework Core

I have been working lately on project migration form ASP.NET MVC 5 that is using Entity Framework 6 to ASP.NET Core 2.1 with Enity Framework Core 2.1. During the work, I found a raw query implemented in Entity Framework 6 as following (not really): 

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var studentIdParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Database
        .SqlQuery<int>(query, studentIdParam).ToListAsync();

    if (gradeList.Count == 0)
    {
        return 0;
    }

    return gradeList.Average();
}

Meanwhile, it is not possible to do so in Entity Framework core. I have to look for solutions and I found 2 of them:

The first solution is a simple implementation with ADO.NET, you can find it in my github account: method GetScoreAdoNet(int studentId). However, I try to avoid ADO.NET because of internal rules in our team and mainly for maintenance reasons.

So, I kept looking for another solution using Entity Framework Core. Thanks to the great community in stackoverflow, I found this answer for my problem. Here is the second and better solution : 

I will be using Query types proposed by Entity Framework Core. 

First, we have to create a data model that will be used as return type of the executed SQL query. Although in my sample (here), I just return a number (int), I have to create a model that has one property. The name of the property should be the same name of the column selected in the SELECT Statement.

 public class AverageGrade
    {
        public int Grade { get; set; }
    }

Then, we need to configure it in the dbcontext in the method OnModelCreating 

modelBuilder.Query<AverageGrade>();

And finally we can call the raw SQL query:

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var idParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Query<AverageGrade>()
        .FromSql(query, idParam).ToListAsync();

    return gradeList.Select(x => x.Grade).ToList().Average();
}

That’s it, migration was done successfully.

The Query Types have some limitations. In case you will be using them, please read the official documentation.

I prepared 2 projects, one with Entity Framework 6 and the second with Entity Framework Core (both of them are using ASP.NET Core 2.1). You can find the code source here.

Data preparation (part 2)

In the previous post, we went through the pre-preparation phase,  collecting meta-data, data profiling and data preparation rules. This post is mainly about dealing with missing values aka Nulls.

Before looking for methods to deal with nulls, confirm that you are really missing some data. It is possible to have some blanks in the data set that can be replaced with a value. Non quality data may have null in place of no. Let’s say a column that only has “yes” and “null” values. You should verify if the system/application that generates the data doesn’t assign any value when it is negative/false response. In that case, you only replace null with no and don’t delete the column.

In addition, meta-data can help with missing data by mentioning the out-of-range entries with types: unknown, unrecorded, irrelevant, and that can be for different reasons such as

  • Malfunctioning equipment
  • Changes in database design
  • Collation of different datasets
  • Measurement not possible

Missing data types

First, we need to understand the different types of missing data. There are 3 different types:

Missing completely at Random (MCAR)

The type title explains itself. The data are missing for random reasons (example: measurement sensor ran out of battery) and unrelated to any other measured variable. It just happened randomly.

Example:

We conducted a survey at University Campus about extracurricular activities, one of the questions is the student’s age. The survey was available online and we had some volunteers who asked students in the campus directly. After we collected the data, we started preparing it. We found out that some students did not mention their age because it was not mandatory. In this case, the age missing values are missing completely at random.

Missing at Random (MAR)

The missing data are independent on all the unobserved values, but it is dependent on the observed values. “What?!” Let’s make it simple:

We have a dataset of cars characteristics

BrandModelNbr of DoorsNbr of SeatsAirbag
AudiA655 
Mercedes BenzE6355Yes
AudiA455 
BMWM332Yes
RenaultMegan55No
SkodaSuperb55Yes
Mercedes BenzS56055Yes
Peugeot50855No
SkodaOctavia RS55Yes
TeslaModel S55Yes
AudiA855 
TeslaModel 355Yes

We have missing values in the airbag column. You notice that the missing values are dependent on the column Brand. If we group the data by the brand value, we find out that all the missing values have as brand value “Audi”.

Missing not at Random (MNAR)

The missing data are not only dependent on the observed data, but also dependent on the unobserved data.

Example:

A survey was conducted about mental disorder treatment worldwide. The results showed that respondents from low/lower-income countries are significantly less likely to report treatment than high-income country respondents.

— — — —

Dealing with missing data

How to deal with the missing data? There are 3 different possibilities:

1 – Data Deletion

First, this method should be used only with MCAR case. There are 2 different deletion methods that most of data analysts/scientists are using:

Drop them (Listwise deletion):

Basically you have to remove the entire row if it has at least one missing value. This method is recommended if your data set is large enough so that the dropped data does not affect the analysis. Most of the labs or companies have a minimum percentage of data that is required and if that threshold is not attainable, they remove the rows with missing data. Personally, if most (more than 50%) values of a variable are null or missing, I “usually” drop the column.

Pairwise deletion:

Data will not be dropped in this case. If the analysis needs all the columns, you select only the rows without any missing values. Meanwhile, if the analysis task needs some variables (not all of them) and it happens that the rows with missing values have the required values for this task, you add them to the selected data for the task resolution.

Example:

For this example, the CAR data set will be used. *Let’s assume it has 50 rows and there are missing data only in rows number 1 and 6

BrandModelNbr of DoorsNbr of SeatsAirbag
1AudiA655 
2Mercedes BenzE6355Yes
3BMWM332Yes
4SkodaSuperb55Yes
5Mercedes BenzS56055Yes
6Peugeot50855No
7SkodaOctavia RS55Yes
.
.
.
50TeslaModel 355Yes

1st Task: Association rules task to find association hypothesis between number of seats and number of doors. The needed attributes are: Brand, Model, Nbr of Seats and Nbr of doors. In this case, we can use all the data set because there are no missing values for the given models.


2nd Task: Association rules task to find association hypothesis between number of seats and number of airbags. The needed attributes are: Brand, Model, Nbr of Seats and Nbr of airbags. To resolve the task, we eliminate rows number 1 and 6 and we use the rest.

2 – Replace missing values

The third option to deal with missing values is to replace them. Here it gets a bit complicated because there are different ways to achieve it.

Mean/median substitution

Replace missing values with the mean or median value. We use this method when the missing values are numerical type and the missing values represent less than 30%.

However, with missing values that are not strictly random, especially in the presence of a great inequality in the number of missing values for the different variables, the mean substitution method may lead to inconsistent bias .

Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–406. doi:10.4097/kjae.2013.64.5.402

Common value imputation

We use the most common value to replace the missing values. For example, we have a column color in the Car dataset that we used previously which has 100 records. The color column has 5 values only, the most common value (67x) is Black. So, we replace the missing values with Black. However, this method may lead also to inconsistent bias.

Regression imputation

Regression imputation let us avoid biases in the analysis. We know that Mean/Median method replaces the missing values with current ones. Instead of doing that, we predict the missing values using the available data. This way, we gain new values and retain the cases with missing values.

Multiple imputation

Multiple imputation “approach begin with a prediction of the missing data using the existing data from other variables [15]. The missing values are then replaced with the predicted values, and a full data set called the imputed data set is created. This process iterates the repeatability and makes multiple imputed data sets (hence the term “multiple imputation”). Each multiple imputed data set produced is then analyzed using the standard statistical analysis procedures for complete data, and gives multiple analysis results. Subsequently, by combining these analysis results, a single overall analysis result is produced. “

Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–406. doi:10.4097/kjae.2013.64.5.402

The purpose of multiple imputation is to have a statistically valid inference and not to find the true missing data, because there is no way to predict the missing data and get it 100% right. The main advantage of this method is the elimination of biases and it is easy to use. Meanwhile, to get a correct imputation model, you need to take in consideration the conditions needed for this method and avoid some pitfalls.

In case you want to use multiple imputation method, I recommend reading the following articles : Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls (BMJ 2009;338:b2393) and When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts (DOI: 10.1186/s12874-017-0442-1)

3 – Create new field / variable

Missing data have its own usefulness mainly when it is not MCAR (Missing Completely At Random). Therefore, we create a new variable or field that records the witnessed behavior or pattern of the missing values. This can be also useful if you own the tool that generates the data, you can create a new engineered feature based on the missing data pattern.

— — — —

Further reading

  1. How to Handle Missing Data
  2. The prevention and handling of the missing data

References

  1. ibm.com: Pairwise vs. Listwise deletion: What are they and when should I use them? , Accessed 27/02/2019 (https://www-01.ibm.com/support/docview.wss?uid=swg21475199)
  2. ncbi.nlm.nih.gov: The prevention and handling of the missing data, Accessed 21/04/2019 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100)
  3. measuringu.com: 7 ways to handle missing data, Accessed 15/04/2019 (https://measuringu.com/handle-missing-data)

Data preparation (part 1)

Data preparation is the most time consuming phase in any data related cycle whether you are preparing the data for machine learning model or data mining or BI.

I will explain how to prepare the data efficiently by following different steps.

Many people who are starting their career in the data field forget about an important step. They ask for the data and start preparing it straight away.

But before that, you should do some pre-preparation.

Business Understanding (pre-prepration)

First, you need to understand the business logic. Every data analysis task is related to business task.

Ask for explanation and read the available documentation. In addition, meetings with a business analyst in that organization or service/product owners may be required. You gained a lot of time with this step (you would find out that some data are missing afterwards if you skip it or the data structure does not make sense and many random problems)

Tip: when collecting the data, ask for the data governance department (in case there is one). The people there have useful and priceless information.

*Don’t let them convince you that the data is self explanatory.

Business understanding does not have a simple method to use. You just need to figure out how the business works and most importantly how the data was generated. After finishing this step, ask for the
needed data to the given task.

Now, we can start the data preparation. To do so we need the metadata.

Collect the metadata

Metadata is the data that describes the data. Having the metadata is a must, if it not accessible you should create it with the help of the data owner.

Metadata helps with identifying the attributes of the data-set, the type of each attribute and sometimes even the values assigned for a concrete attribute.

Data profiling

Data profiling is important to better understand the data. It “is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data”.  Data profiling includes structure discovery, content discovery and relationship discovery. This step makes it easier to discover and choose the needed data. Also, if similar data are needed for next iterations, you know already how to deal with it and the whole data preparation process becomes more easier.

Define data preparation rules (optional)

This step applies for big data. Data preparation rules are the methods of cleansing and transforming the data.

Why? Cleaning big data is not a simple task and it’s time consuming. Imagine you delete rows using the value of an attribute as condition, than you find out that the condition is missing something and the size of your data-set is 5TB. That will take you forever to figure out the right condition.

How? We use a random sample from our data-set, we cleanse it and transform it. The script that was used to prepare the
random data sample will be used for the whole data-set.

The random sample must be valid. I will write a blog post about generating a correct and valid random sample.

Iterative preparation

Start with the basic cleansing steps that apply for any dataset. After that you tackle the challenging steps such as dealing with missing data. Let the data transformation to the end.

In the part 2, we will understand how to deal with missing values and how to get better quality data.

Open Sourcing of Windows Calculator is great news for many developers

Microsoft announced on Wednesday 6th of March 2019 that it made Windows Calculator an open source software (you can find it on GitHub). Suddenly, the internet went crazy with memes and posts mocking the last announcement judging it as a small piece of code, it’s not worth it and blah blah blah.

Figure 1: Windows Calculator. source: me

I understand why some people are frustrated and see it as a small project and it is not worth it when comparing the project size with .NET or VS Code, but these people are narrow-minded. They did not look at it from a different angle. Well, let me explain the real value of the Windows Calculator in the open source world.

The calculator is a simple project that new developers and students make as one of their first projects, and they feel proud of it (at least I was proud of my Calculator project). You start with simple operations, but you find out that Parsing is needed and some conditions are a must so that the app does not crash.

Figure 2: young boys on computers. source: pixabay

Later on, complex operations can be added and the real dev problems appear, such as different results using different types (Float vs Double vs Decimal), or saving the last operations. And it keeps getting bigger and bigger. At this moment, beginners don’t know how to choose the right project structure or how to improve their code and write clean code. Imagine you have the source code of the most used calculator in the world made by the biggest Software Company worldwide! That’s insane.  Wait! Microsoft offered more than source code, it included the project Architecture, unit tests, and the build system.

The Application Architecture is useful even for junior developers and students. Solid use of the MVVM design pattern in a real application is helpful to plan their first applications.

Moreover, the Calculator application is written in Visual C++ (C++/CX), a set of extensions to the C++ language using for Win Apps and Win Runtime components, which is a solid programming language and it most students in universities have C++ classes. Microsoft offered them the semestral project to have great grades 😀 This is just joke, do not do it if you are a new learner, do it yourself and then you can compare your work. That way, you improve your skills and it is great to learn from your own mistakes 😉

 Windows calculator is built for the Universal Windows Platform using XAML UI framework. Developers can learn more about making their own custom controls and VisualStates, and that comes in handy for creating and publishing apps in Microsoft Store.

Finally, they won’t stop in the development phase, but they will learn Azure Pipelines for the build, deployment and release phases. This is so important because it can be hard to apply CI/CD in the first projects.

Figure 3: thumbs up. source: pixabay

To conclude, the Windows Calculator is the best example to learn Microsoft’s full development lifecycle.