Rock your first Job as a Software Developer

Source: Image by Joe Alfaraby from Pixabay

You got your first job and you were happy like never before. Then, you asked yourself: How can I make the most of it? What should I do to grow as software developer? Well, you are in the right place my friend 😊.

You will learn in this blog post what you should do to move forward in your career by building solid technical skills and improve/build the needed soft skills in the IT field generally and software development specifically.

TL;DR

I explain in this post the following steps to rock your first job and move forward:

  • focus on communication
  • programming Fundamentals
  • learn new thing daily
  • avoid reinventing the wheel
  • ask for help
  • give back
  • treat people good
  • get mentorship

You can further read the key point at the end of each paragraph or the whole post 😁.

Communication

Communication and soft skills matter more than your code skills!

When you work on a project, there are many phases that the team goes through. Some of these phases do not require coding, but communication with the customer: for example, the business analysis phase. Also, during the development phase, you will be communicating non-stop with your teammates.

If your team adopts Agile methodology, you are continuously in communication with the customer during the development phase too. The communication with the customer requires understanding their requirements in plain English, so you need to avoid using technical words and explain them in the simplest way possible. Then, you translate the requirements to documentation that will be used for the construction. This may be done by business analysts or even you if your team does not have one. Meanwhile, the communication with your fellow developers is different; it is more technical.

Key point: Communication is needed on daily basis. So, work on this skill to progress faster and do the right job in the first try.

Fundamentals matter the most

Software development fundaments are the base that you need to build new skills on top of it.

It does not matter to learn programming in C++, Java, C# or any other programming language. What matter is knowing data structures, programming fundamentals, algorithms, design patterns, etc.

These fundamentals are included in most Computer Science degrees, but most universities/schools “forget” to explain why you should learn the “old stuff” and not fancy things or the buzz words (microservices, containers, etc..) that you hear in every conference and read in blog posts and twitter.

Key point: Focus on getting the fundamentals to a solid level and you will learn new frameworks and tools easily and fast; and by fast, I mean in matter of weeks you would learn new stuff like docker or GraphQL.

Learn new thing daily

To keep progressing, you should to keep learning. I recommend that you make the following statement part of your daily tasks:

“learn a new technical thing no matter how difficult or easy it is. Anything that you did not know how to do before”. 

This will keep you motivated. If you had a bad day and code did not work, you would be saying: “at least I learned how to <what you learned>. So, it is not bad after all”. Our field is progressing daily. You have to keep up with that.

You will also notice how consistency pays off even with the smallest effort. And if you are a math lover, the following calculation shows how a bit of daily effort does the difference on one year:

Source: pinterest *

Key point: learning new thing in daily basis is an achievement that keeps you motivated, keep you growing, and makes your day 😉.

*Props to my friend Davide Bellone for recommending the mathematical calculation. Meanwhile, I could not find the right person to give him/her credits for the photo and the idea.. To whom it concern, I apologize.

Avoid (Do not) reinventing the wheel

A common mistake that some junior developers do in the early stage. I admit that I made this mistake and I have seen later on other juniors committing it too. At one moment, you feel confident enough to write some code that you could get in NuGet package (equivalent of npm packages in .NET) or an open source framework. For example, if you need to work with GraphQL in .NET project, you should search for a package that already exists and look for the community feedback through the blog posts to find the most recommended package. There is a tiny tiny chance that you will not find a ready-to-use library to solve your task.

Why you should not reinvent the wheel? You are wasting time that you can use to do better things in the project and solve real problem that has no solution yet. Moreover, you cannot reinvent with better quality at this stage, not judging your coding skills, but the existing packages are typically made by big teams or open-sourced where community members contribute to improve it. In addition, you receive (security) updates for the installed packages that are precious later on for the reliability, maintainability and stability of your software.

With that being said, you must always be ambitious to invent the wheel even in this early stage of the career. Reinventing the wheel may be possible too with more experience, not for the sake of reinventing, but to make an improvement and contribution to the community and the field.

Key point: Focus on solving problems with no solutions and get a solid experience to be able to (re)invent the wheel in the near future.

Ask for Help

We create software to solve problems. During the architecture of a solution and the construction phase, we all encounter problems that we did not solve before. Some of them are tricky and take most of your time. That’s where you need help. Luckily the IT community is very helpful. So, don’t be afraid to seek help.

Please bear in mind that you should not ask a question every 2 minutes without searching for a solution by yourself. Also, don’t spend a week searching for a solution without asking for help.

My suggestion is to go through these steps (by order):

  1. Ask your colleagues if they encountered the problem before. If they cannot help, move on to the next step
  2. Ask your local community or in the company’s internal forum in case it has one. A local community is generally composed of your friends and your local network that you created when attending some workshops and after-work sessions
  3. Ask the global community on stackoverflow, Twitter, product forum, etc. Make sure to sanitize the code from private content and keys when publishing it online
  4. If you could not solve it by this time, you should raise a ticket to the vendor (company that owns the product/ technology you are using). This step typically is not free, but your company should be willing to pay for it.

*This is my personal order. The last 2 steps can be swiped depending on how sensitive the project that you are working on. So, don’t take it as a 100% true in all cases.

Key point: Don’t hesitate to ask for help when you struggle. Everyone needs help even senior developers and architects. No question is stupid, keeping a question for yourself (because you are afraid/embarrassed) is STUPID.

Give Back

We all ask for help and go through different forums to solve our problems. Most of us also look for the confirmed answer and its absence upsets us. Now it is your turn to help others and take a moment to answer questions that are on your knowledge scope. Moreover, you should give back to your school and community by providing some workshops and assistance for the students and newbies in the field.

Key point: Contribute whenever it is possible and help others the way you want them to help you.

Treat people good

The key to success and fast growth is your people network. As mentioned before, half of the work or more is communication with people whether they are your colleagues or your clients.

During your journey, you will have moments where you look at the code and say “who is the idiot that wrote this code?” or you may find missing/not well done documentation that gives you hard time to understand an old product that needs some maintenance or upgrade. First, Let me tell you that there is no idiot or stupid person. Second, most of the time it is your code that you do not recognize and that’s thanks to your progress and evolution as software developer. So, don’t judge people too early (or never, you do not the whole story most of the time) because they have also their own journey and they deserve respect.

In addition to that, we live in a small world specially the IT community. Make sure to create your people network by joining open discussions in your company, workshops and conferences locally, regionally and why not globally. You should follow people on the social media platforms and stay connected and active too. The people network that you create will help you move fast in your career by finding new opportunities such as jobs and customers for your future business.

Key point: Create a people network and treat everyone good because solid relations matter the most within the business and in the casual life.

Get mentorship

As a junior developer, you already have demonstrated enough skills to get your first job and that is awesome. Since you are reading this blog post, it means by now you want to evolve and get to the next level. You may have been looking for a while and read other articles for up-skilling and getting to associate developer level. The key player is finding a mentor! A good mentor in your case would be a senior developer from your team or a friend from the community or virtual friend. You notice that I referred to the closest social circles and you may wonder why? The answer is simple: many people think mentorship is just about pair programming and technical advices from a “good experienced coder” rather than a senior developer/team lead or an architect (hint: not every good experienced coder has the soft skills for positions like tech lead or senior architect). So, they end up saying that is not necessary to have a mentor and that is so wrong!!!.

Mentorship is not just about technical advices and helping you solve problems you facing in your current task. It goes far beyond that. A mentor, that knows you well after working hours, can recognize your character, soft skills, your weaknesses. Evidently, he can help you with your career path, because he has seen a lot and can tell what an architect path is like or manager or tech lead, etc. He has been in different positions with different managers and teams, he got the needed experience and wisdom. In addition, a mentor keeps you on track, keeps you focus, and challenges you from time to time. Finally, he motivates you and won’t let you abandon your plans.  

If you could not find a mentor in your circle of friends, try to find one in your local community or online. My virtual friend Davide Bellone mentioned in his latest blog post that “Twitter is a great place to start with, as well as other websites like Reddit and Dev.to.”

Key point: a mentor helps you sharp your technical and soft skills, shape your career and get on the fast track. Get yourself a mentor now!

— — — —

That is it! It is time to shine 😊. All the best with your career.

~*~*~*~*~*~

Credits

I thank all my friends that answered my questions when preparing for this blog post. I crowdsourced the tips from my friends and virtual friends on Twitter and Facebook. In addition, I looked into different articles and validated my ideas against what they listed like the post Transitioning Into Your First Junior Developer Role. Finally, I added my point of view and my personal experience to the final output.

Additional resources

Move Resources between 2 Azure Subscriptions

This blog post shows how to validate the move operation of resources between 2 Azure Subscriptions and how to move them successfully by going through all the steps needed.

Figure 1 : Move resources between 2 subscriptions

Microsoft Azure offers the possibility to move resources from one resource group to another one in the same subscription or from one Subscription to another Subscription in the same Azure tenant.

The available documentation is limited for the validation so that you have to put the puzzle together yourself by collecting each piece of information from different articles. Therefore, my main goal is to give you a guidance from A to Z without the need to waste your time. I will add links for the different articles that I used in case you are interested in reading them.

To move the resources, there are 2 major steps:

  • Validate the move operation: It is optional, but highly recommended.
  • Move operation : The main action.

— — — —

Part 1: Validate the move operation

To validate the move operation, we need to call the dedicated REST API endpoint. There is no other option for this action. To execute it succesfully, we have to:

  1. Create an Azure Service Principal,
  2. Prepare the request body,
  3. Get an access token,
  4. and finally make the REST Call

1 – Create Azure Service Principal

You can skip this step if you have one already.

First, Let’s create an Azure service principal (sp). If you are not familiar with Azure sp, basically we are registering an application in the Azure Active Directory (AAD) and assigning a role to it. Check the official documentation to learn more about Azure service principal.

Create the application with just 2 clicks in Azure Portal: go to Azure Active Directory >> App registrations >> fill the form. You can do it also with PowerShell or Azure CLI.

Figure 2 : Register an Azure Application

Next, we generate a client secret by going to Certificates & Secrets tab and clicking on Add a client secret. Note the client secret, we will need it in the next steps.

Figure 3 : Generate a client secret

Finally, we have to assign the Contributor role for the registered application in the source resource group.

Figure 4 : Assign contributor role

2 – Prepare the request body and URI Parameters

We need to collect the following items:

  • Tenant id
  • SubscriptionId of both the source and the target Subscriptions
  • Names of both the source and the target Resource Groups
  • Sources that we desire to move

Let’s start by connecting to azure and listing all subscriptions in PowerShell:

Connect-AzAccount

# List subscriptions
Get-AzContext -ListAvailable | Select-Object Name, Subscription, Tenant | Format-List

Then, we need to set the context to the source Subscription and get the resource group name (in case you forgot it  😄)

# Select a subscription as current context
Set-AzContext -SubscriptionId <sourceSubscriptionId>

# Get Names and Locations of resource groups in the selected Subscription:
Get-AzResourceGroup | Select-Object ResourceGroupName, Location

Finally, we get the Ids of the resources in the given resource group. I add formatting so that you copy paste straight to request body. You only need the first command to get the resources:

# Get the resources 
$resourcesList= Get-AzResource -ResourceGroupName 'rg-sdar-westeurope' | Select-Object 'ResourceId'  | foreach {$_.ResourceId}

# Format the values by adding double quotes and join them with commas
$resourcesListFormatted= '"{0}"' -f ($resourcesList -join '","')

# Copy to clipboard
Set-Clipboard -Value $resourcesListFormatted

Create a new http request in Postman, go straight to the body tab, and choose the type raw. Construct the request body as follows:

{
 "resources": [<paste the recently copied resources list>],
 "targetResourceGroup": "/subscriptions/<targetSubscriptionId>/resourceGroups/targetResourceGroupName"
}

It should look like this:

Figure 5 : Request body

3 – Get Oauth2 Token

It is mandatory for the authorization of the POST request.

We get it by making a POST Call to https://login.microsoftonline.com/<subscriptionId>/oauth2/token with the following values in the request body (formatted as x-www-form-urlencoded in Postman):

  • grant_type : client_credentials
  • client_id : client_id of the registered app in the first step
  • client_Secret : the noted client secret in the first step
  • resource : https://management.azure.com/
Figure 6: Get an Oauth2 token

4 – Validate the move operation

All the previous steps lead to this action. Go to the Post request created in the second step and paste the request URL with the required values https://management.azure.com/subscriptions/<sourceSubscriptionId>/resourceGroups/<sourceResourceGroupName>/validateMoveResources?api-version=2019-05-10

The authorization type is a Bearer token. Use the received access token that we gained in the previous step.

Figure 7 : Add an authorization token

Send it. The response status code should be 202 Accepted with an empty response body.

Figure 8 : Validate move resources

In case you get 400 Bad Request with a response error message ResourceNotTopLevel, you need to remove that resource from the validation request body, because it will be moved automatically with the main resource. You can get this error with database or WebApp slot >> The solution is to move the SQL Server and the whole WebApp that has the WebApp slot respectively.

Make sure you have the permissions needed, you checked the limitations of your sources, and the subscription quotas. For more details, read the checklist before moving resources section in the official documentation.

— — — —

Part 2: Move resources

To move the resources to another subscription, it is possible to do it in 4 different ways:

  • Post Request with the REST API (Similar to the validate move operation)
  • Azure PowerShell
  • Azure CLI
  • Or using the Portal

I am a lazy person, so I always choose the easiest way. That means Portal is the choice 😂.

Go to the source resource group, click on Move and choose to another subscription. Next check the checkbox for the resources that you want to move, select the target subscription and the target resource group. Finally click ok.

Figure 9 : Move resources

Congrats, you moved the resources to the new subscription like a champ 😎.

~*~*~*~*~*~

References and important links

  1. Move resources to a new resource group or subscription
  2. Supported resources – Move operation
  3. Validate Move Resources
  4. Troubleshoot moving Azure resources to new resource group or subscription

LUIS Migration – False Error Message: Module {..} already exists

Here I am back with another story from my daily work problems. We have been using Language Understanding (LUIS) for one of our solutions for a while. Recently, Microsoft made some upgrades and created a new platform that benefits from resource authoring based on Role-Based Access Control (RBAC), Azure Active Directory (Azure AD) and custom domains. So, we had to migrate our LUIS app. And that did not go well.

For those who are not familiar with LUIS, Azure offers Azure cognitive services offer a set of REST APIs that helps you build intelligent applications without the need to develop your own models with Machine Learning or Deep Network, etc. One of these services is Language Understanding (LUIS). It is a cloud-based API service that applies custom machine-learning intelligence to natural language text to predict overall meaning, and pull out relevant, detailed information. For more details, check the official site.

The Luis portal is being changed because it cannot keep up with Azure services. So, Microsoft decided to connect the new upgrades with creation of a new portal. The bad part is that every user, who has been using the old portal, has to migrate to the new one by himself/herself, so that it decommissions the old portal by June 2020. See the unofficial announcement here https://github.com/azure-deprecation/dashboard/issues/26.

To help with the Migration Microsoft offered some documentation such as this one https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-migration-authoring.

We decided to export the application from the old portal and add it to the new portal. In other words, we are not following Microsoft’s migration workflow because it is not the smartest for a professional environment. We chose the best approach that is possible to keep the solution in production running with the LUIS app in the old portal. Meanwhile, we create another one in the new portal and hoping that everything goes smooth.

(For the demo, I used Travel Agent Sample from the samples provided by Microsoft)

So, we started the migration process and it did not go as planned (like always). When importing the application, we had this error message:

BadArgument: The models: { BookFlight } already exist in the specified application version.

Figure 1 : Import Application

The error message does not say a lot. “The models” literally does not make any sense in this context or in the context of LUIS. BookFlight is the name of an intent and an entity in the imported application. That is what caused the error. The new platform does not accept same name for an intent and an entity, so we had to rename it.

A good error message can be something like this:

You cannot assign the same name for an entity and an intent. Please use a unique name for the intent {BookFlight}

(I have no idea why we have it like that, the LUIS application is there for more than 2 years, literally before I joined my team, but this is not an excuse.)

Such a small issue took us some hours and an effort of 3 persons with troubleshooting it.

There are 2 solutions for this problem:

  1. Make sure that each intent has a unique name. The name must not be used in entities or any other part of the application. (This is the best solution)
  2. Rename the entity, but also the pattern if you are using one

To retrace the error, upload the application that is provided in this GitHub repository. An other option to retrace the error is by adding an entity with the name of an existing intent or the other way around.

Figure 2: Create new entity

— — — —

To recap, we saw how a wrong error message looks like and how to fix the error (in case you had the same error). Check out also my blog post Fix a gap in GitHub API v4. Well if you read to the end, thank you very much for your support. Wishing you a Happy New Year!

Fix a gap in GitHub API v4

In this blog post, I want to show you how I solved the problem of getting commit files content with GitHub GraphQL API (v4) and to avoid using the GitHub REST API (v3) with its crazy amount of API calls. The solution is available in this GitHub Repository.

If you want the solution straight away , click here to skip the explanation of the problem.

GitHub Octocat (source: https://github.com/logos)

GitHub is the most famous code-hosting platform for version control and collaboration. It is essential for all IT people. One of the best features of GitHub is the offered APIs to manipulate the content and integrate GitHub in a workflow such as CI pipeline. It offers two versions of API: GitHub API v3, which is a REST API and GitHub GraphQL API v4. These are the current two stables API versions. (If you are wondering about the previous versions, you can get some information in GitHub documentation.

The GitHub REST API v3 covers “all” (or almost, I am not sure) areas. However, tons of API calls need to be made in order to satisfy a workflow job. Therefore, GitHub decided to replace REST API with GraphQL in the version 4 and this is why:

GitHub chose GraphQL for our API v4 because it offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints. GraphQL lets you replace multiple REST requests with a single call to fetch the data you specify.

GitHub documentation, https://developer.github.com/v4

However, the new API (GraphQL API v4) did not solve the problem 100%. It still has some gaps and it does not cover all areas. You may need to go back to the version 3 to satisfy a given job.

Disclaimer: I was novice to GitHub API and GraphQL when I started working in this project.

I was working in a project that integrates GitHub in one of its processes. I chose the GraphQL API v4. Everything was going smooth until I had to get the content of the last commit.  Google helped a bit sometimes and github.community forums too (not well referenced in Google). I looked for a way to get the content of a given commit with GraphQL, I thought that it was an obvious thing; I managed to create this graph with the provided documentation.

If you are novice to GraphQL, you can start your learning journey here graphql.org/learn and I recommend the query editor GraphiQL : the GitHub repository github.com/skevy/graphiql-app / download page electronjs.org/apps/graphiql

{
   rateLimit{
    cost
    remaining
  }
  repository(name: "GitHubAPIDemo", owner: "MohamedSahbi") {
    ref(qualifiedName: "master") {
      name
      id
      target {
        ... on Commit {
          id
          history(first: 1) {
            pageInfo {
              hasNextPage
            }
            totalCount
            edges {
              node {
                author {
                  name
                  date
                }
                changedFiles
                commitResourcePath
                oid
                abbreviatedOid
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

fragment GetAllFiles on Tree {
  ... on Tree {
    entries {
      name
      type
      oid
      object {
        ... on Tree {
          entries {
            name
            type
            oid
            object {
              ... on Blob {
                text
              }
            }
          }
        }
      }
    }
  }
}

This graph query returns by order:

  • The cost of my request in the rate limit section. This is important because every user has a limited credit of 5000 request per hour, but a single GraphQL call can cost 1 credit, 100 credits or more than 5000 credits. For more details, see the explanation provided by GitHub.
  • The most recent commit details since we choose “history (first:1)”
  • The repository content that we chose to get in the section:
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }

What I found it that the commit history does not include the changed files URLs that are provided using the REST API v3. I kept looking for a way to fix my query. I was still believing that it is provided out-of-the-box, but since I am not experienced with GraphQL, I thought I made a mistake in my query.

Then, I lost hope for a while. I decided to query the GitHub repository with REST API v3. To go the content of each file I have to:

"files": [
    {
      "sha": "9907549076a9271ee4948e909eb0669d3ba4875b",
      "filename": "LICENSE",
      "status": "added",
      "additions": 21,
      "deletions": 0,
      "changes": 21,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/LICENSE?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,21 @@\n+MIT License\n+\n+Copyright (c) 2019 Mohamed Sahbi\n+\n+Permission is hereby granted, free of charge, to any person obtaining a copy\n+of...."
    },
    {
      "sha": "7b9e8fe3adf9f784749834da35fecda8a5392bd3",
      "filename": "README.md",
      "status": "added",
      "additions": 2,
      "deletions": 0,
      "changes": 2,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/README.md?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,2 @@\n+# GitHubAPIDemo\n+This repository contains a sample demo for my blog post"
    }
  ]
  • Third, we loop over the files array and each time we have to:
    • Call the endpoint that is giving in the attribute contents_url
    • The previous call returns a JSON object that contains the attribute download_url, which is the last API endpoint that we have to call to get the file content (Finally!!!).

Imagine you want to get the updated files in the last commit without knowing the commit hash, that is:

2 API calls + (2 * number of updated files) API calls >= 4 API calls

The duplicate data, the useless extra information in the JSON responses, and the huge number of endpoint calls to get files content pushed me to keep looking further for a better solution. I kept looking until I found this post about the same problem that I am facing. It convinced me that I have to fix it myself since there is no out-of-the-box solution.

Suddenly, I found this great post about the GraphQL aliases. I should have paid more attention when learning GraphQL or spend some extra hours learning. And yeah I do not how I end up finding about aliases when I was solving the problem.

I guess you know where I am going here. Alias is the best way to avoid the 2*number of updated files API calls that I had to do using the REST API v3.  The solution is to profit from the best features of both GitHub API v3 and v4.

Solution

The final process looks like this:

  1. Get the commits using REST API v3
  2. Get the commit content also using REST API v3
  3. Generate a single GraphQL query with aliases to get the files content.

I created a sample project in GitHub with a GitHub Service that you can reuse easily. The are 2 main methods offered by this service:

  • GetLastCommitFilesContent(string directory, DateTime? startingFrom = null) : it gets the last commit, look for its content and then generate a graphQL query and retrieve the files by calling GraphQL API. It returns an object of type GitHubServiceOutput.
  • GetCommitFilesContent(string commitHash, string directory) : a similar method to the previous one, but it takes commit hash as parameter. The output is like the previous method’s output.

You will find the details about the code sample and the used libraries in the GitHub repository.

Custom Object Comparison in C#

A while ago, I needed to compare complex objects in C#. I forget to mention the real reason when I wrote the article and thanks to the feedback that I got, here is the main reason: Not only a Boolean result is needed from the comparison but also I need to have as output the properties with different values and the possibility to exclude properties from the comparison, . I looked for such function that would provide the same functionality of string.Compare() but for complex objects of the same type. Object.Equals() Method alone did not satisfy my need and it needed an implementation and overriding of the method which was not convenient. My next stop was stackoverflow and I found quite interesting approach in this discussion. That did not satisfy my need to the fullest so I took it as a start point and I had to implement my own comparison class (the implementation is available on GitHub with a running sample).

I created a static class Comparer with a constraint on the generic parameter to a class which satisfies my need. If you do not know what do constraints on type parameters mean, go to Microsoft docs (I recommend reading it and understanding what it is and why it is used because you will need it for sure). Then, reflection was the choice to get things done. PropertyInfo class which belongs to System.Reflection namespace was enough to do the work. This class allows us to get the different attributes of a property of an object and its metadata which I use to compare the properties of the 2 objects.

The created class offers different methods that you may find helpful :

GenerateAuditLog() method literally generates log. It returns an object of type ComparisonResult which can be inserted into logHistory table in your database. This method is overloaded so that you can exclude some properties from the comparison.

GenerateAuditLogMessages() method returns a list of messages that contains only changes. There is no overload for this method.

HasChanged() method simply returns Boolean result. You can eventually exclude some properties from the comparison. I found this method useful for updating records in the database.

That is all!! I hope you find it useful. Feel free to use the code or improve it.

Note: The code is not optimized (no DRY approach) because I take in consideration people who wants to use one method so they can copy the code simply (I personally recopy my code, improve it and adopt it to the case that I have).

Raw queries with Entity Framework Core

tl;dr

In this blog post, I showcase how to migrate raw SQL query from Entity Framework 6 to EF Core 2.1 and EF Core 3.1. You can find the whole sample in GitHub.

— — — —

I have been working lately on project migration form ASP.NET MVC 5 that is using Entity Framework 6 to ASP.NET Core 2.1 with Entity Framework Core 2.1. During the work, I found a raw query implemented in Entity Framework 6 as following (not really): 

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var studentIdParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Database
        .SqlQuery<int>(query, studentIdParam).ToListAsync();

    if (gradeList.Count == 0)
    {
        return 0;
    }

    return gradeList.Average();
}

Meanwhile, it is not possible to do so in Entity Framework core. I have to look for solutions and I found 2 of them:

The first solution is a simple implementation with ADO.NET, you can find it in my github account: method GetScoreAdoNet(int studentId). However, I try to avoid ADO.NET because of internal rules in our team and mainly for maintenance reasons.

So, I kept looking for another solution using Entity Framework Core. Thanks to the great community in stackoverflow, I found this answer for my problem. Here is the second and better solution : 

Solution for EF Core 2.1

I will be using Query types proposed by Entity Framework Core. 

First, we have to create a data model that will be used as return type of the executed SQL query. Although in my sample (here), I just return a number (int), I have to create a model that has one property. The name of the property should be the same name of the column selected in the SELECT Statement.

 public class AverageGrade
    {
        public int Grade { get; set; }
    }

Then, we need to configure it in the dbcontext in the method OnModelCreating 

modelBuilder.Query<AverageGrade>();

And finally we can call the raw SQL query:

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var idParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Query<AverageGrade>()
        .FromSql(query, idParam).ToListAsync();

    return gradeList.Select(x => x.Grade).ToList().Average();
}

Solution for EF Core 3.1

Starting from EF Core 3.0, the proposed solution for EF Core 2.1 is obsolete. It is part of many changes in EF Core 3.0 that you can find here.

A data model is needed for the output of the executed SQL query, same like in EF Core 2.1 solution.

public class AverageGrade
    {
        public int Grade { get; set; }
    }

The next step is adding the data model to the ModelBuilder in the method OnModelCreating :

modelBuilder.Entity<AverageGrade>().HasNoKey();

The last step is to use DbContext.Set<>() instead of DbContext.Query<>() in the method GetScore(int studentId). In other words, the line number 12 (in the last code block in the Solution for EF Core 2.1) is replaced with this line of code:

var gradeList = await _universityContext.Set<AverageGrade>().FromSqlRaw(query, idParam).ToListAsync();

— — — —

That’s it, migration was done successfully.

The Query Types have some limitations. In case you will be using them, please read the official documentation.

Finally, I prepared 3 projects, one with Entity Framework 6, the second with Entity Framework Core 2.1 (the first 2 projects use ASP.NET Core 2.1), and the third one with EF Core3.1. You can find the code source in GitHub.

Data preparation (part 2)

In the previous post, we went through the pre-preparation phase,  collecting meta-data, data profiling and data preparation rules. This post is mainly about dealing with missing values aka Nulls.

Before looking for methods to deal with nulls, confirm that you are really missing some data. It is possible to have some blanks in the data set that can be replaced with a value. Non quality data may have null in place of no. Let’s say a column that only has “yes” and “null” values. You should verify if the system/application that generates the data doesn’t assign any value when it is negative/false response. In that case, you only replace null with no and don’t delete the column.

In addition, meta-data can help with missing data by mentioning the out-of-range entries with types: unknown, unrecorded, irrelevant, and that can be for different reasons such as

  • Malfunctioning equipment
  • Changes in database design
  • Collation of different datasets
  • Measurement not possible

Missing data types

First, we need to understand the different types of missing data. There are 3 different types:

Missing completely at Random (MCAR)

The type title explains itself. The data are missing for random reasons (example: measurement sensor ran out of battery) and unrelated to any other measured variable. It just happened randomly.

Example:

We conducted a survey at University Campus about extracurricular activities, one of the questions is the student’s age. The survey was available online and we had some volunteers who asked students in the campus directly. After we collected the data, we started preparing it. We found out that some students did not mention their age because it was not mandatory. In this case, the age missing values are missing completely at random.

Missing at Random (MAR)

The missing data are independent on all the unobserved values, but it is dependent on the observed values. “What?!” Let’s make it simple:

We have a dataset of cars characteristics

BrandModelNbr of DoorsNbr of SeatsAirbag
AudiA655 
Mercedes BenzE6355Yes
AudiA455 
BMWM332Yes
RenaultMegan55No
SkodaSuperb55Yes
Mercedes BenzS56055Yes
Peugeot50855No
SkodaOctavia RS55Yes
TeslaModel S55Yes
AudiA855 
TeslaModel 355Yes

We have missing values in the airbag column. You notice that the missing values are dependent on the column Brand. If we group the data by the brand value, we find out that all the missing values have as brand value “Audi”.

Missing not at Random (MNAR)

The missing data are not only dependent on the observed data, but also dependent on the unobserved data.

Example:

A survey was conducted about mental disorder treatment worldwide. The results showed that respondents from low/lower-income countries are significantly less likely to report treatment than high-income country respondents.

— — — —

Dealing with missing data

How to deal with the missing data? There are 3 different possibilities:

1 – Data Deletion

First, this method should be used only with MCAR case. There are 2 different deletion methods that most of data analysts/scientists are using:

Drop them (Listwise deletion):

Basically you have to remove the entire row if it has at least one missing value. This method is recommended if your data set is large enough so that the dropped data does not affect the analysis. Most of the labs or companies have a minimum percentage of data that is required and if that threshold is not attainable, they remove the rows with missing data. Personally, if most (more than 50%) values of a variable are null or missing, I “usually” drop the column.

Pairwise deletion:

Data will not be dropped in this case. If the analysis needs all the columns, you select only the rows without any missing values. Meanwhile, if the analysis task needs some variables (not all of them) and it happens that the rows with missing values have the required values for this task, you add them to the selected data for the task resolution.

Example:

For this example, the CAR data set will be used. *Let’s assume it has 50 rows and there are missing data only in rows number 1 and 6

BrandModelNbr of DoorsNbr of SeatsAirbag
1AudiA655 
2Mercedes BenzE6355Yes
3BMWM332Yes
4SkodaSuperb55Yes
5Mercedes BenzS56055Yes
6Peugeot50855No
7SkodaOctavia RS55Yes
.
.
.
50TeslaModel 355Yes

1st Task: Association rules task to find association hypothesis between number of seats and number of doors. The needed attributes are: Brand, Model, Nbr of Seats and Nbr of doors. In this case, we can use all the data set because there are no missing values for the given models.


2nd Task: Association rules task to find association hypothesis between number of seats and number of airbags. The needed attributes are: Brand, Model, Nbr of Seats and Nbr of airbags. To resolve the task, we eliminate rows number 1 and 6 and we use the rest.

2 – Replace missing values

The third option to deal with missing values is to replace them. Here it gets a bit complicated because there are different ways to achieve it.

Mean/median substitution

Replace missing values with the mean or median value. We use this method when the missing values are numerical type and the missing values represent less than 30%.

However, with missing values that are not strictly random, especially in the presence of a great inequality in the number of missing values for the different variables, the mean substitution method may lead to inconsistent bias .

Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–406. doi:10.4097/kjae.2013.64.5.402

Common value imputation

We use the most common value to replace the missing values. For example, we have a column color in the Car dataset that we used previously which has 100 records. The color column has 5 values only, the most common value (67x) is Black. So, we replace the missing values with Black. However, this method may lead also to inconsistent bias.

Regression imputation

Regression imputation let us avoid biases in the analysis. We know that Mean/Median method replaces the missing values with current ones. Instead of doing that, we predict the missing values using the available data. This way, we gain new values and retain the cases with missing values.

Multiple imputation

Multiple imputation “approach begin with a prediction of the missing data using the existing data from other variables [15]. The missing values are then replaced with the predicted values, and a full data set called the imputed data set is created. This process iterates the repeatability and makes multiple imputed data sets (hence the term “multiple imputation”). Each multiple imputed data set produced is then analyzed using the standard statistical analysis procedures for complete data, and gives multiple analysis results. Subsequently, by combining these analysis results, a single overall analysis result is produced. “

Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–406. doi:10.4097/kjae.2013.64.5.402

The purpose of multiple imputation is to have a statistically valid inference and not to find the true missing data, because there is no way to predict the missing data and get it 100% right. The main advantage of this method is the elimination of biases and it is easy to use. Meanwhile, to get a correct imputation model, you need to take in consideration the conditions needed for this method and avoid some pitfalls.

In case you want to use multiple imputation method, I recommend reading the following articles : Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls (BMJ 2009;338:b2393) and When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts (DOI: 10.1186/s12874-017-0442-1)

3 – Create new field / variable

Missing data have its own usefulness mainly when it is not MCAR (Missing Completely At Random). Therefore, we create a new variable or field that records the witnessed behavior or pattern of the missing values. This can be also useful if you own the tool that generates the data, you can create a new engineered feature based on the missing data pattern.

— — — —

Further reading

  1. How to Handle Missing Data
  2. The prevention and handling of the missing data

References

  1. ibm.com: Pairwise vs. Listwise deletion: What are they and when should I use them? , Accessed 27/02/2019 (https://www-01.ibm.com/support/docview.wss?uid=swg21475199)
  2. ncbi.nlm.nih.gov: The prevention and handling of the missing data, Accessed 21/04/2019 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100)
  3. measuringu.com: 7 ways to handle missing data, Accessed 15/04/2019 (https://measuringu.com/handle-missing-data)

Data preparation (part 1)

Data preparation is the most time consuming phase in any data related cycle whether you are preparing the data for machine learning model or data mining or BI.

I will explain how to prepare the data efficiently by following different steps.

Many people who are starting their career in the data field forget about an important step. They ask for the data and start preparing it straight away.

But before that, you should do some pre-preparation.

Business Understanding (pre-prepration)

First, you need to understand the business logic. Every data analysis task is related to business task.

Ask for explanation and read the available documentation. In addition, meetings with a business analyst in that organization or service/product owners may be required. You gained a lot of time with this step (you would find out that some data are missing afterwards if you skip it or the data structure does not make sense and many random problems)

Tip: when collecting the data, ask for the data governance department (in case there is one). The people there have useful and priceless information.

*Don’t let them convince you that the data is self explanatory.

Business understanding does not have a simple method to use. You just need to figure out how the business works and most importantly how the data was generated. After finishing this step, ask for the
needed data to the given task.

Now, we can start the data preparation. To do so we need the metadata.

Collect the metadata

Metadata is the data that describes the data. Having the metadata is a must, if it not accessible you should create it with the help of the data owner.

Metadata helps with identifying the attributes of the data-set, the type of each attribute and sometimes even the values assigned for a concrete attribute.

Data profiling

Data profiling is important to better understand the data. It “is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data”.  Data profiling includes structure discovery, content discovery and relationship discovery. This step makes it easier to discover and choose the needed data. Also, if similar data are needed for next iterations, you know already how to deal with it and the whole data preparation process becomes more easier.

Define data preparation rules (optional)

This step applies for big data. Data preparation rules are the methods of cleansing and transforming the data.

Why? Cleaning big data is not a simple task and it’s time consuming. Imagine you delete rows using the value of an attribute as condition, than you find out that the condition is missing something and the size of your data-set is 5TB. That will take you forever to figure out the right condition.

How? We use a random sample from our data-set, we cleanse it and transform it. The script that was used to prepare the
random data sample will be used for the whole data-set.

The random sample must be valid. I will write a blog post about generating a correct and valid random sample.

Iterative preparation

Start with the basic cleansing steps that apply for any dataset. After that you tackle the challenging steps such as dealing with missing data. Let the data transformation to the end.

In the part 2, we will understand how to deal with missing values and how to get better quality data.

Open Sourcing of Windows Calculator is great news for many developers

Microsoft announced on Wednesday 6th of March 2019 that it made Windows Calculator an open source software (you can find it on GitHub). Suddenly, the internet went crazy with memes and posts mocking the last announcement judging it as a small piece of code, it’s not worth it and blah blah blah.

Figure 1: Windows Calculator. source: me

I understand why some people are frustrated and see it as a small project and it is not worth it when comparing the project size with .NET or VS Code, but these people are narrow-minded. They did not look at it from a different angle. Well, let me explain the real value of the Windows Calculator in the open source world.

The calculator is a simple project that new developers and students make as one of their first projects, and they feel proud of it (at least I was proud of my Calculator project). You start with simple operations, but you find out that Parsing is needed and some conditions are a must so that the app does not crash.

Figure 2: young boys on computers. source: pixabay

Later on, complex operations can be added and the real dev problems appear, such as different results using different types (Float vs Double vs Decimal), or saving the last operations. And it keeps getting bigger and bigger. At this moment, beginners don’t know how to choose the right project structure or how to improve their code and write clean code. Imagine you have the source code of the most used calculator in the world made by the biggest Software Company worldwide! That’s insane.  Wait! Microsoft offered more than source code, it included the project Architecture, unit tests, and the build system.

The Application Architecture is useful even for junior developers and students. Solid use of the MVVM design pattern in a real application is helpful to plan their first applications.

Moreover, the Calculator application is written in Visual C++ (C++/CX), a set of extensions to the C++ language using for Win Apps and Win Runtime components, which is a solid programming language and it most students in universities have C++ classes. Microsoft offered them the semestral project to have great grades 😀 This is just joke, do not do it if you are a new learner, do it yourself and then you can compare your work. That way, you improve your skills and it is great to learn from your own mistakes 😉

 Windows calculator is built for the Universal Windows Platform using XAML UI framework. Developers can learn more about making their own custom controls and VisualStates, and that comes in handy for creating and publishing apps in Microsoft Store.

Finally, they won’t stop in the development phase, but they will learn Azure Pipelines for the build, deployment and release phases. This is so important because it can be hard to apply CI/CD in the first projects.

Figure 3: thumbs up. source: pixabay

To conclude, the Windows Calculator is the best example to learn Microsoft’s full development lifecycle.

Boost your productivity: Azure Data Studio

“I am suffering from these tools, they consume a lot of memory and they need a lot of space” or ” I am overwhelmed with the features of this tool, somehow I find myself lost and I can’t figure out how to do simple tasks”.. Does this sound familiar to you? Many tools nowadays offer great features but we need terabytes of storage to have them locally and a powerful device is needed too. Well, today is your lucky day if you are dealing with databases. Have you heard about Azure Data Studio? In this post, I will give you some tips to improve your work performance with Azure Data Studio. Here is the structure of this blog post.

Table of contents

Introduction

Overview

UI

Export User Settings

Change Terminal Shell

Subscriptions Filter

Connect to multiple Azure accounts

Run script from file

Introduction:

Azure Data Studio was firstly introduced in Pass Summit 2017 (it was called SQL Operations Studio). It is a cross-platform tool for database design and operations. If you are friendly with Visual Code, you will love Azure Data Studio. It is a lightweight version, with the necessary tools and you won’t be overwhelmed with many features such as the case with SSMS (Microsoft SQL Server Management Studio).

Overview:

Azure Data Studio is a light-weight cross-platform database management tool for Windows, MacOS, and Linux. It is free (no license needed) and it is an open source project. Azure Data Studio is based on VS Code and MSSQL extension in VS Code, written in ElectronJS. You can report bugs, request new features and contribute to the project. Extensions are an amazing feature of Visual Code, so does the Azure Data Studio, you can add extensions but they are not that much (for the moment 😉 ).

It supports Azure SQL Database, Azure Data Warehouse, MSSQL Server whether running in cloud or on-premises. T-SQL Query is mainly supported by autosuggestions, formatting and advanced coding features. Meanwhile, it still supports other languages such as JSON, XML, Python, SQL, yaml, dockerfile… In addition, you can work with workspaces, folders. Source Control (GIT) is integrated, so no problem with managing your files. This is an amazing feature especially for those who opt for CI/DI pipeline using Azure DevOps. Speaking of Azure, Azure Resource Explorer is a panel in Azure Data Studio that allows you to connect to your Azure account(s) and work with your different subscriptions. If you work with PowerShell or different shells, you can do it also in Azure Data Studio thanks to the internal terminal as it is the case with Visual Code. It has another bunch of features.

The Queen of Vermont and Entity Framework @Julie Lerman, a Microsoft Regional Director and MVP, wrote two blog posts in MSDN magazine about Azure Data Studio: Data Points – Visual Studio Code: Create a Database IDE with MSSQL Extension(June 2017) and Data Points – Manage Data Across Multiple Sources with Azure Data Studio(December 2018). So, I advise you to read them because I am not repeating what she has already written (I don’t have her level of knowledge and skills so I won’t make it perfect the way she does). Meanwhile, I will give you some tips that will help you.

UI :

The User Interface of Azure Data Studio is similar to the one of Visual Code. It is simpler and not overwhelmed with menus. It has the classic left sidebar as it is the case in VS Code. You can split the window the way you want (literally limitless splitting). Figure 1 shows you how to change theme color.

Figure 1: change theme color

Export User Settings:

Most of us have at least 2 devices: a business laptop, and personal laptop/tablet. Let’s say you tested Azure Data Studio in your personal device and customized it. Then, you decided to install it in the second device with the same customized settings. I got you covered! It is easier than you think. I will recreate the two first steps in the Figure 2. (1) Open user settings by clicking on the Settings logo in the bottom left corner and then settings, you can open it from command palette: open it using the keyboard shortcut CTRL+comma. (2) Now, move the cursor over the tab and give it a right click, then “Reveal in Explorer”. (3) Copy the file settings.json and send it to your work device. (4) Repeat the same first steps in your work device and finish it by replacing the current settings.json file with the other file. Done.

Figure 2: export user settings

Change Terminal Shell:

The first time you install Azure Data Studio, you will have to choose default terminal shell. Later on, when you click on the add new terminal icon, you will get the same terminal shell, but what if I want to use Powershell and Cmd at the same time. To do so, you need to change the default terminal shell by opening the Command Palette (CTRL+Shift+P), then type “Select Default Shell” and hit the enter key. You can change the shell type by selecting one from the given list. There is another way to do it: Open User Settings (the first step in the previous Tip), then search for :

"terminal.integrated.shell.windows": “Shell path”

You have to change the path and you are ready to go.

For example, my default terminal shell is CMD so it looks like this

"terminal.integrated.shell.windows": "C:\\WINDOWS\\System32\\cmd.exe"

In order to change it to PowerShell, I only replace the Shell path string

    "terminal.integrated.shell.windows": "C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"

*You have to repeat these steps every time you want to change the shell.

Subscriptions Filter:

It is a simple tip, but it is worth it in case you have many subscriptions. Azure Data Studio allows multiple linked accounts, that means you can connect to different Azure accounts and use all the resources at the same time, which is really COOL. However, you may have many subscriptions and you will not use all of them. To get rid of the unnecessary ones, you use the subscription filter by hovering over the account as it is demonstrated in Figure 3.

Figure 3: subscriptions filter

Connect to multiple Azure accounts:

I just discovered it while writing the Subscriptions Filter tip that it may be tricky to connect to a second or third Azure account. You need to click on the person icon (bottom left corner of the window). Figure 4 shows you the way.

Figure 4: connect to multiple Azure accounts

Run a script from file:

Internal terminals, an amazing feature that Azure Data Studio has. It makes the life of DB Admins easier, you don’t need to open many apps and windows, just all in one tool is needed. We all know that we can open a folder or a workspace in Explorer (Explorer panel of Azure Data Studio, not Windows Explorer), you have some file scripts and you want to run it. There is command provided for this task to run the active file or just the selected text in the active terminal. In Figure 5, you can see how it works. This command doesn’t have a keyboard shortcut by default (you can add one by editing the keyboard shortcuts 😉). To make it work, open the command palette and type:

Terminal: Run Active File in Active Terminal

Or

Terminal: Run Selected Text in Active Terminal
Figure 5: Run script from file

~*~*~*~*~*~

There are other features you will enjoy in Azure Data Studio and they will improve your work such as Auto Save (you don’t have to save changes with Ctrl+S every time), the process Explorer or peek a definition, etc.

The purpose of this blog post was showing some cool features of this great tool. You may notice that I did not mention anything related to working with databases, queries, or extensions; they need another blog post. For now, you can start with the Quickstarts Tutorials and the official documentation. If you don’t have it yet, you download it here.

Contribute

Please help the community by giving your feedback and contributing on GitHub.

— — — —

References

To write this blog post, I used the official documentation of Azure Data Studio, Julie Lerman blog posts in MSDN Magazine and two blogs from VisualStudioMagazine (here and here).