Summary of Microsoft Ignite Spring 2021

Microsoft Ignite Spring 2021 took place last week from the 2nd to the 4th of March. As usual, many announcements happened that include new products and updates. So, I summarized them for you in this blog post.

Microsoft Mesh

Microsoft Mesh is the next thing in the Mixed Reality World. A new platform to deliver collaborative mixed reality experiences. It enables users to work together, connect  and collaborate regardless of the physical location. In other words, Holograms may become a reality sooner than what we (maybe just me) expected.  I will let you enjoy this video (duration: 2min27s), because words cannot give it enough credits.

Source: Microsoft YouTube Channel

You can learn more by reading the technical overview Microsoft Mesh.

Microsoft Viva

Microsoft Viva is an employee experience platform. It is the latest product of the Microsoft 365 Enterprise family. It was announced month ago, but already new announcements has been made during Ignite mainly availability and pricing. Microsoft Viva has 4 modules:

Microsoft  Viva Connections

  • Viva Connections focus on connecting employees across the organization
  • License and pricing: If you have M365 and SharePoint license, you will get it for free. In other words, free to Microsoft 365 Enterprise

Microsoft Viva Insights

  • The needed module for the Home Office Era where employers need some Workplace Analytics and take care of their employees work habits.
  • License and pricing: part of workplace and analytics

Microsoft Viva Learning

  • A central hub for learning and up-skilling courses
  • License and pricing: not announced yet

Microsoft Viva Topics

  • The smart module; it transforms content to knowledge using AI

You can get more information in the following links

How IT is transforming the employee experience at Microsoft – Microsoft 365 Blog

Introducing Microsoft Viva, the Employee Experience Platform in Microsoft Teams – Microsoft Tech Community

Microsoft Viva: Empowering every employee for the new digital age – Microsoft 365 Blog

— — — —

Power Platform

Power Automate Desktop

It is an RPA tool that has been available for a while on Power Automate. Now it comes with Windows 10 without extra cost

Integration with Teams

The Power Apps, Power Automate, and Power Virtual Agents studios are available on Teams. You can design on Teams without the need to open browser anymore.

Administration

Governance Controls
  • Endpoint filtering: you can restrict the access to connectors. It provides a granular access control.  
  • Tenant Isolation: it allows you to choose which tenants you can allow to communicate with your tenant (inbound isolation).
  • Tenant Wide Analytics: reporting capabilities across the tenant.
Data Loss Prevention
  • Connector Action Controls: you can filter connector actions. You can choose which actions can be used by your company from a given connector

Microsoft Information Protection Support (Later this year)

You will be able to label your data with sensitivities level. It will be synchronized with M365, and Azure.

Azure Networking Connectivity (Later this year)

It allows the creation of private endpoints to connect to Azure Virtual Network. That will leverage the use of Express Route, gateways and other networking features that you have on your Azure tenant. You will be able manage networking security from Azure without any compromise.

Customer Managed Keys (Later this year)

First class user interface experience to use your keys from Azure Vault Keys inside Power Platform apps.

Microsoft Power Fx

Open-source language for low-code programming. Basically, the same syntax that was used before in Power Apps (canvas apps). It is possible to write it in YAML files from now on. More new features will be delivered on June 2021 and by the end of the year.

For more details check Introducing Microsoft Power Fx: the low-code programming language for everyone | Microsoft Power Apps

Moreover, there is something for integration with Power BI too. Direct Query support for Dataverse in Power BI is generally available. For more details check Microsoft Dataverse support for Power BI Direct Query reaches general availability | Microsoft Power Apps

— — — —

Microsoft Teams

I have to say it is the star of the Microsoft Ignite 2021. Almost all other products announcements are related to Teams.  In addition to that, Microsoft Teams has tons of new features and enhancements (although not all of them are available yet).

Here is a selected features that I found interesting:

Presenter mode(not available yet)

This feature allows you to share your content and alongside your video stream. It has 3 modes called Standout, Reporter and Side-by-side.

PowerPoint Live

Microsoft Teams made use of the whole screen. Presenter can see the slides, the notes, meeting chat and participants at the same time. Also, attendees can go through slides at their own pace.

Live Reactions in Microsoft Teams meetings (like those of Microsoft Ignite Stream)

Webinars (not available yet)

Organizers can add a custom registration page to the webinars. Attendees will receive an email confirmation after registration.

As for attendees numbers, Microsoft Teams supports up to 1000  attendees in interactive meetings and webinars. For view only, Teams can accommodate 10,000 attendees in general and up to 20,000 attendees until the end of this year. Both features are available this month.

Microsoft Teams Connect (not available yet)

The struggle with collaboration between different organizations is “gone” (not sure, I didn’t try yet). It enables the interaction and collaboration on documents as an external user. Also, you won’t have to disconnect from your tenant and connect to the other one as a guest, the shared channel will appear as a team in your primary tenant.

And more..

The features list goes on and it covers Chat and Collaboration, Security and Compliance, Calling (operator connect), Device, and Management. You can find them in the announcement blog What’s New in Microsoft Teams | Microsoft Ignite 2021 – Microsoft Tech Community

— — — —

Windows Server 2022

Windows Server 2022 is now available in preview.  The new release “includes advanced multi-layer security, hybrid capabilities with Azure, and a flexible platform to modernize applications with containers.”(source: Announcing Windows Server 2022—now in preview – Windows Server Blog (microsoft.com) )

— — — —

Windows 10

Windows Hello Multi-camera support

You can choose which camera to use for authentication with Windows Hello.

Universal Print (generally available)

IT Pros now have one hub to set printers in an enterprise environment. The feature was announced back in July 2020. For more details, check ) Announcing Universal Print: a cloud-based print solution – Microsoft Tech Community

Rest of the announcements are on this link Windows & Devices at Microsoft Ignite 2021: March edition – Microsoft Tech Community

— — — —

Azure

Azure Percept

The newest product in Azure. Azure Percept allows you to use Azure AI technologies on the edge by a combination of hardware and services. The hardware consists of Azure Percept Audio, Azure Percept Vision and a Trusted Platform Module. The platform offers also Azure Percept Studio to enable customers with less technical skills to use the product. You can read more about in the AI blog With Azure Percept, Microsoft adds new ways for customers to bring AI to the edge – The AI Blog.

Azure Migrate

Azure Migrate was released last year as a one-stop hub to ease migration to Azure. The recent update is new unified tools for discovering and assessing databases, .NET apps and VMs. It allows the discovery of SQL Server instances and databases in the VMware environment. In the new feature, it will cover IIS too.

Passwordless (Azure AD)

Microsoft has been working for a while on changing the authentication system by eliminating passwords. Azure AD support for FIDO2 security keys in hybrid environment is now available on public preview. FIDO2 security keys is supported as authentication method for signing into operating system and applications/services. You can find more details on Public preview of Azure AD support for FIDO2 security keys in hybrid environments – Microsoft Tech Community

I recommend watching the session Inside Azure Datacenter Architecture presented by , Microsoft Azure CTO, Mark Russinovich Inside Azure Datacenter Architecture with Mark Russinovich | OD343 – YouTube .

— — — —

Dynamics 365

Finally, Microsoft Dynamics 365 has something to offer too. It is offering a new application called Intelligent Order Management, and it is integrating Microsoft Teams to offer better experience for users of both products. You can read more about those capabilities in the official announcement Announcing new Dynamics 365 capabilities at Microsoft Ignite – Microsoft Dynamics 365 Blog .

Lessons learned from first experience with Power Apps

Source: PowerApps official site

Last months, I had the opportunity to create my first Power Apps application at work and it was also my first time working with Power Apps, I did not even try a hello world app before (well, I’ve seen demos before). However, I work with Power Automate (Flow) and O365 quite often. Therefore, I am used to the environment and the available tools.

Power platform is a low-code tool for citizen developers. Meanwhile, I am used to the traditional software development. This blog post is from a point of view of a software engineer.

Forget about DRY principle

“Don’t Repeat Yourself” (DRY) is a common principle we use in software development. This principle is hard to apply when working with Power Apps. It does not support macros or shared methods/ expression.

How to get around it? Use the select function of a button or any component to make a shared function. Then you call select(buttonName) to invoke the function. Don’t forget to hide the button.

This is not my idea, I found it when solving this problem in a blog post that I could not find anymore. Credits goes to the unknown who helped me :/ . However, here is a similar one that you can follow to apply DRY principle. Microsoft BI Tools: Power Apps Snack: Don’t repeat yourself (microsoft-bitools.blogspot.com) .

Keep on mind that you will have to repeat yourself from time to time because there is no other way to get it done.

Error while Exporting  package

This error occurs when you delete a Power Automate flow that was used by the app. To solve the problem, you have to remove the “unused flow” from data section. Good luck with finding the flow in case you use multiple ones, Microsoft decided to show the GUID of the flow without the name, and reference flows by name only in Power Apps.

Saving the changes will not make the problem disappear; you have to publish it because the export applies only to the live version (published version).

Figure 1: Export Power App Package

Renaming/Updating Power Automate flow

The Power Platform team agreed on minimum integration and sync between Power Automate and Power Apps. When you rename a flow or add a new parameter, it does not propagate in the app. You have to remove the current one with the old name and then select it with the new name. It is a painful process, so try to name it correctly the first time.

Adding a Power Automate flow makes code disappear

It is like magic! When you add a new flow to OnSelect or any other function, it deletes the current code and it replaces it with flowName.Run( .I am not sure who thought that this can be a good idea.. Please copy your code before adding a flow. You will need this in case you renamed the flow or added a new parameter.

Figure 2: Add new flow to a Power App

Power Automate ToolTip

A product development goes through changes especially when working with Agile methodologies. You may have to add parameters to the flow. When using the flow in Power Apps, the tooltip does not support multiple arguments (I did not measure it, but I have to work with flows that have more than 6 arguments and up to 13 arguments). At one point, you don’t know which argument you are setting and you cannot call its name, because the tooltip will show non-clickable three dots.

Figure 3: Flow ToolTip

How to solve it? I did not find a concrete way, but it goes by FIFO order when creating the flow (populated with Ask in Power Apps). In other words, if you add a new parameter, it will be the last one to set its value in the flowName.Run(arg1,arg2,.., lastArg).

PowerBI visual

One of the great options of Power Apps is Power Apps Visual, a canvas in Power BI report. It comes with its limitations of course, but some of them are not listed and I discovered them while developing the app.

Not all metrics are refreshed

Metrics with data type Number are refreshed whenever you interact with the report. I used dependent dropdown lists that change when clicking on a table or a matrix.

I wanted to use another metric with data type Text that would be displayed on my app. The idea is to refresh the text same as the case of the other metrics. However, it did not work. I tried to solve it with the help of Power BI experts in our team, but we could not make it work. The verdict is that metric with data type Text cannot refresh its value in the Power App Visual.

PowerApps (visual) Data has size limit

I needed to use multiple hierarchies in the app alongside some metrics. At one moment, the app canvas does not load in the report and when editing it in Power App Visual Studio, it did not load the recently binded data. When I remove the recently added hierarchies, it works again.

What is the size limits? Well, who knows! I asked in Power Apps forum, but I did not get an answer. So, keep binding data until you achieve the limit or try to keep it clean and bind the minimum. Also, avoid binding hierarchies and such types because the report will take more time to load.

Figure 4: Power App loading forever

~*~*~*~*~*~

I hope this blog post will help you a bit. One last recommendation, save your changes so often with comments.

Good luck with your Power app.

Rock your first Job as a Software Developer

Source: Image by Joe Alfaraby from Pixabay

You got your first job and you were happy like never before. Then, you asked yourself: How can I make the most of it? What should I do to grow as software developer? Well, you are in the right place my friend 😊.

You will learn in this blog post what you should do to move forward in your career by building solid technical skills and improve/build the needed soft skills in the IT field generally and software development specifically.

TL;DR

I explain in this post the following steps to rock your first job and move forward:

  • focus on communication
  • programming Fundamentals
  • learn new thing daily
  • avoid reinventing the wheel
  • ask for help
  • give back
  • treat people good
  • get mentorship

You can further read the key point at the end of each paragraph or the whole post 😁.

Communication

Communication and soft skills matter more than your code skills!

When you work on a project, there are many phases that the team goes through. Some of these phases do not require coding, but communication with the customer: for example, the business analysis phase. Also, during the development phase, you will be communicating non-stop with your teammates.

If your team adopts Agile methodology, you are continuously in communication with the customer during the development phase too. The communication with the customer requires understanding their requirements in plain English, so you need to avoid using technical words and explain them in the simplest way possible. Then, you translate the requirements to documentation that will be used for the construction. This may be done by business analysts or even you if your team does not have one. Meanwhile, the communication with your fellow developers is different; it is more technical.

Key point: Communication is needed on daily basis. So, work on this skill to progress faster and do the right job in the first try.

Fundamentals matter the most

Software development fundaments are the base that you need to build new skills on top of it.

It does not matter to learn programming in C++, Java, C# or any other programming language. What matter is knowing data structures, programming fundamentals, algorithms, design patterns, etc.

These fundamentals are included in most Computer Science degrees, but most universities/schools “forget” to explain why you should learn the “old stuff” and not fancy things or the buzz words (microservices, containers, etc..) that you hear in every conference and read in blog posts and twitter.

Key point: Focus on getting the fundamentals to a solid level and you will learn new frameworks and tools easily and fast; and by fast, I mean in matter of weeks you would learn new stuff like docker or GraphQL.

Learn new thing daily

To keep progressing, you should to keep learning. I recommend that you make the following statement part of your daily tasks:

“learn a new technical thing no matter how difficult or easy it is. Anything that you did not know how to do before”. 

This will keep you motivated. If you had a bad day and code did not work, you would be saying: “at least I learned how to <what you learned>. So, it is not bad after all”. Our field is progressing daily. You have to keep up with that.

You will also notice how consistency pays off even with the smallest effort. And if you are a math lover, the following calculation shows how a bit of daily effort does the difference on one year:

Source: pinterest *

Key point: learning new thing in daily basis is an achievement that keeps you motivated, keep you growing, and makes your day 😉.

*Props to my friend Davide Bellone for recommending the mathematical calculation. Meanwhile, I could not find the right person to give him/her credits for the photo and the idea.. To whom it concern, I apologize.

Avoid (Do not) reinventing the wheel

A common mistake that some junior developers do in the early stage. I admit that I made this mistake and I have seen later on other juniors committing it too. At one moment, you feel confident enough to write some code that you could get in NuGet package (equivalent of npm packages in .NET) or an open source framework. For example, if you need to work with GraphQL in .NET project, you should search for a package that already exists and look for the community feedback through the blog posts to find the most recommended package. There is a tiny tiny chance that you will not find a ready-to-use library to solve your task.

Why you should not reinvent the wheel? You are wasting time that you can use to do better things in the project and solve real problem that has no solution yet. Moreover, you cannot reinvent with better quality at this stage, not judging your coding skills, but the existing packages are typically made by big teams or open-sourced where community members contribute to improve it. In addition, you receive (security) updates for the installed packages that are precious later on for the reliability, maintainability and stability of your software.

With that being said, you must always be ambitious to invent the wheel even in this early stage of the career. Reinventing the wheel may be possible too with more experience, not for the sake of reinventing, but to make an improvement and contribution to the community and the field.

Key point: Focus on solving problems with no solutions and get a solid experience to be able to (re)invent the wheel in the near future.

Ask for Help

We create software to solve problems. During the architecture of a solution and the construction phase, we all encounter problems that we did not solve before. Some of them are tricky and take most of your time. That’s where you need help. Luckily the IT community is very helpful. So, don’t be afraid to seek help.

Please bear in mind that you should not ask a question every 2 minutes without searching for a solution by yourself. Also, don’t spend a week searching for a solution without asking for help.

My suggestion is to go through these steps (by order):

  1. Ask your colleagues if they encountered the problem before. If they cannot help, move on to the next step
  2. Ask your local community or in the company’s internal forum in case it has one. A local community is generally composed of your friends and your local network that you created when attending some workshops and after-work sessions
  3. Ask the global community on stackoverflow, Twitter, product forum, etc. Make sure to sanitize the code from private content and keys when publishing it online
  4. If you could not solve it by this time, you should raise a ticket to the vendor (company that owns the product/ technology you are using). This step typically is not free, but your company should be willing to pay for it.

*This is my personal order. The last 2 steps can be swiped depending on how sensitive the project that you are working on. So, don’t take it as a 100% true in all cases.

Key point: Don’t hesitate to ask for help when you struggle. Everyone needs help even senior developers and architects. No question is stupid, keeping a question for yourself (because you are afraid/embarrassed) is STUPID.

Give Back

We all ask for help and go through different forums to solve our problems. Most of us also look for the confirmed answer and its absence upsets us. Now it is your turn to help others and take a moment to answer questions that are on your knowledge scope. Moreover, you should give back to your school and community by providing some workshops and assistance for the students and newbies in the field.

Key point: Contribute whenever it is possible and help others the way you want them to help you.

Treat people good

The key to success and fast growth is your people network. As mentioned before, half of the work or more is communication with people whether they are your colleagues or your clients.

During your journey, you will have moments where you look at the code and say “who is the idiot that wrote this code?” or you may find missing/not well done documentation that gives you hard time to understand an old product that needs some maintenance or upgrade. First, Let me tell you that there is no idiot or stupid person. Second, most of the time it is your code that you do not recognize and that’s thanks to your progress and evolution as software developer. So, don’t judge people too early (or never, you do not the whole story most of the time) because they have also their own journey and they deserve respect.

In addition to that, we live in a small world specially the IT community. Make sure to create your people network by joining open discussions in your company, workshops and conferences locally, regionally and why not globally. You should follow people on the social media platforms and stay connected and active too. The people network that you create will help you move fast in your career by finding new opportunities such as jobs and customers for your future business.

Key point: Create a people network and treat everyone good because solid relations matter the most within the business and in the casual life.

Get mentorship

As a junior developer, you already have demonstrated enough skills to get your first job and that is awesome. Since you are reading this blog post, it means by now you want to evolve and get to the next level. You may have been looking for a while and read other articles for up-skilling and getting to associate developer level. The key player is finding a mentor! A good mentor in your case would be a senior developer from your team or a friend from the community or virtual friend. You notice that I referred to the closest social circles and you may wonder why? The answer is simple: many people think mentorship is just about pair programming and technical advices from a “good experienced coder” rather than a senior developer/team lead or an architect (hint: not every good experienced coder has the soft skills for positions like tech lead or senior architect). So, they end up saying that is not necessary to have a mentor and that is so wrong!!!.

Mentorship is not just about technical advices and helping you solve problems you facing in your current task. It goes far beyond that. A mentor, that knows you well after working hours, can recognize your character, soft skills, your weaknesses. Evidently, he can help you with your career path, because he has seen a lot and can tell what an architect path is like or manager or tech lead, etc. He has been in different positions with different managers and teams, he got the needed experience and wisdom. In addition, a mentor keeps you on track, keeps you focus, and challenges you from time to time. Finally, he motivates you and won’t let you abandon your plans.  

If you could not find a mentor in your circle of friends, try to find one in your local community or online. My virtual friend Davide Bellone mentioned in his latest blog post that “Twitter is a great place to start with, as well as other websites like Reddit and Dev.to.”

Key point: a mentor helps you sharp your technical and soft skills, shape your career and get on the fast track. Get yourself a mentor now!

— — — —

That is it! It is time to shine 😊. All the best with your career.

~*~*~*~*~*~

Credits

I thank all my friends that answered my questions when preparing for this blog post. I crowdsourced the tips from my friends and virtual friends on Twitter and Facebook. In addition, I looked into different articles and validated my ideas against what they listed like the post Transitioning Into Your First Junior Developer Role. Finally, I added my point of view and my personal experience to the final output.

Additional resources

Move Resources between 2 Azure Subscriptions

This blog post shows how to validate the move operation of resources between 2 Azure Subscriptions and how to move them successfully by going through all the steps needed.

Figure 1 : Move resources between 2 subscriptions

Microsoft Azure offers the possibility to move resources from one resource group to another one in the same subscription or from one Subscription to another Subscription in the same Azure tenant.

The available documentation is limited for the validation so that you have to put the puzzle together yourself by collecting each piece of information from different articles. Therefore, my main goal is to give you a guidance from A to Z without the need to waste your time. I will add links for the different articles that I used in case you are interested in reading them.

To move the resources, there are 2 major steps:

  • Validate the move operation: It is optional, but highly recommended.
  • Move operation : The main action.

— — — —

Part 1: Validate the move operation

To validate the move operation, we need to call the dedicated REST API endpoint. There is no other option for this action. To execute it succesfully, we have to:

  1. Create an Azure Service Principal,
  2. Prepare the request body,
  3. Get an access token,
  4. and finally make the REST Call

1 – Create Azure Service Principal

You can skip this step if you have one already.

First, Let’s create an Azure service principal (sp). If you are not familiar with Azure sp, basically we are registering an application in the Azure Active Directory (AAD) and assigning a role to it. Check the official documentation to learn more about Azure service principal.

Create the application with just 2 clicks in Azure Portal: go to Azure Active Directory >> App registrations >> fill the form. You can do it also with PowerShell or Azure CLI.

Figure 2 : Register an Azure Application

Next, we generate a client secret by going to Certificates & Secrets tab and clicking on Add a client secret. Note the client secret, we will need it in the next steps.

Figure 3 : Generate a client secret

Finally, we have to assign the Contributor role for the registered application in the source resource group.

Figure 4 : Assign contributor role

2 – Prepare the request body and URI Parameters

We need to collect the following items:

  • Tenant id
  • SubscriptionId of both the source and the target Subscriptions
  • Names of both the source and the target Resource Groups
  • Sources that we desire to move

Let’s start by connecting to azure and listing all subscriptions in PowerShell:

Connect-AzAccount

# List subscriptions
Get-AzContext -ListAvailable | Select-Object Name, Subscription, Tenant | Format-List

Then, we need to set the context to the source Subscription and get the resource group name (in case you forgot it  😄)

# Select a subscription as current context
Set-AzContext -SubscriptionId <sourceSubscriptionId>

# Get Names and Locations of resource groups in the selected Subscription:
Get-AzResourceGroup | Select-Object ResourceGroupName, Location

Finally, we get the Ids of the resources in the given resource group. I add formatting so that you copy paste straight to request body. You only need the first command to get the resources:

# Get the resources 
$resourcesList= Get-AzResource -ResourceGroupName 'rg-sdar-westeurope' | Select-Object 'ResourceId'  | foreach {$_.ResourceId}

# Format the values by adding double quotes and join them with commas
$resourcesListFormatted= '"{0}"' -f ($resourcesList -join '","')

# Copy to clipboard
Set-Clipboard -Value $resourcesListFormatted

Create a new http request in Postman, go straight to the body tab, and choose the type raw. Construct the request body as follows:

{
 "resources": [<paste the recently copied resources list>],
 "targetResourceGroup": "/subscriptions/<targetSubscriptionId>/resourceGroups/targetResourceGroupName"
}

It should look like this:

Figure 5 : Request body

3 – Get Oauth2 Token

It is mandatory for the authorization of the POST request.

We get it by making a POST Call to https://login.microsoftonline.com/<subscriptionId>/oauth2/token with the following values in the request body (formatted as x-www-form-urlencoded in Postman):

  • grant_type : client_credentials
  • client_id : client_id of the registered app in the first step
  • client_Secret : the noted client secret in the first step
  • resource : https://management.azure.com/
Figure 6: Get an Oauth2 token

4 – Validate the move operation

All the previous steps lead to this action. Go to the Post request created in the second step and paste the request URL with the required values https://management.azure.com/subscriptions/<sourceSubscriptionId>/resourceGroups/<sourceResourceGroupName>/validateMoveResources?api-version=2019-05-10

The authorization type is a Bearer token. Use the received access token that we gained in the previous step.

Figure 7 : Add an authorization token

Send it. The response status code should be 202 Accepted with an empty response body.

Figure 8 : Validate move resources

In case you get 400 Bad Request with a response error message ResourceNotTopLevel, you need to remove that resource from the validation request body, because it will be moved automatically with the main resource. You can get this error with database or WebApp slot >> The solution is to move the SQL Server and the whole WebApp that has the WebApp slot respectively.

Make sure you have the permissions needed, you checked the limitations of your sources, and the subscription quotas. For more details, read the checklist before moving resources section in the official documentation.

— — — —

Part 2: Move resources

To move the resources to another subscription, it is possible to do it in 4 different ways:

  • Post Request with the REST API (Similar to the validate move operation)
  • Azure PowerShell
  • Azure CLI
  • Or using the Portal

I am a lazy person, so I always choose the easiest way. That means Portal is the choice 😂.

Go to the source resource group, click on Move and choose to another subscription. Next check the checkbox for the resources that you want to move, select the target subscription and the target resource group. Finally click ok.

Figure 9 : Move resources

Congrats, you moved the resources to the new subscription like a champ 😎.

~*~*~*~*~*~

References and important links

  1. Move resources to a new resource group or subscription
  2. Supported resources – Move operation
  3. Validate Move Resources
  4. Troubleshoot moving Azure resources to new resource group or subscription

LUIS Migration – False Error Message: Module {..} already exists

Here I am back with another story from my daily work problems. We have been using Language Understanding (LUIS) for one of our solutions for a while. Recently, Microsoft made some upgrades and created a new platform that benefits from resource authoring based on Role-Based Access Control (RBAC), Azure Active Directory (Azure AD) and custom domains. So, we had to migrate our LUIS app. And that did not go well.

For those who are not familiar with LUIS, Azure offers Azure cognitive services offer a set of REST APIs that helps you build intelligent applications without the need to develop your own models with Machine Learning or Deep Network, etc. One of these services is Language Understanding (LUIS). It is a cloud-based API service that applies custom machine-learning intelligence to natural language text to predict overall meaning, and pull out relevant, detailed information. For more details, check the official site.

The Luis portal is being changed because it cannot keep up with Azure services. So, Microsoft decided to connect the new upgrades with creation of a new portal. The bad part is that every user, who has been using the old portal, has to migrate to the new one by himself/herself, so that it decommissions the old portal by June 2020. See the unofficial announcement here https://github.com/azure-deprecation/dashboard/issues/26.

To help with the Migration Microsoft offered some documentation such as this one https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-migration-authoring.

We decided to export the application from the old portal and add it to the new portal. In other words, we are not following Microsoft’s migration workflow because it is not the smartest for a professional environment. We chose the best approach that is possible to keep the solution in production running with the LUIS app in the old portal. Meanwhile, we create another one in the new portal and hoping that everything goes smooth.

(For the demo, I used Travel Agent Sample from the samples provided by Microsoft)

So, we started the migration process and it did not go as planned (like always). When importing the application, we had this error message:

BadArgument: The models: { BookFlight } already exist in the specified application version.

Figure 1 : Import Application

The error message does not say a lot. “The models” literally does not make any sense in this context or in the context of LUIS. BookFlight is the name of an intent and an entity in the imported application. That is what caused the error. The new platform does not accept same name for an intent and an entity, so we had to rename it.

A good error message can be something like this:

You cannot assign the same name for an entity and an intent. Please use a unique name for the intent {BookFlight}

(I have no idea why we have it like that, the LUIS application is there for more than 2 years, literally before I joined my team, but this is not an excuse.)

Such a small issue took us some hours and an effort of 3 persons with troubleshooting it.

There are 2 solutions for this problem:

  1. Make sure that each intent has a unique name. The name must not be used in entities or any other part of the application. (This is the best solution)
  2. Rename the entity, but also the pattern if you are using one

To retrace the error, upload the application that is provided in this GitHub repository. An other option to retrace the error is by adding an entity with the name of an existing intent or the other way around.

Figure 2: Create new entity

— — — —

To recap, we saw how a wrong error message looks like and how to fix the error (in case you had the same error). Check out also my blog post Fix a gap in GitHub API v4. Well if you read to the end, thank you very much for your support. Wishing you a Happy New Year!

Fix a gap in GitHub API v4

In this blog post, I want to show you how I solved the problem of getting commit files content with GitHub GraphQL API (v4) and to avoid using the GitHub REST API (v3) with its crazy amount of API calls. The solution is available in this GitHub Repository.

If you want the solution straight away , click here to skip the explanation of the problem.

GitHub Octocat (source: https://github.com/logos)

GitHub is the most famous code-hosting platform for version control and collaboration. It is essential for all IT people. One of the best features of GitHub is the offered APIs to manipulate the content and integrate GitHub in a workflow such as CI pipeline. It offers two versions of API: GitHub API v3, which is a REST API and GitHub GraphQL API v4. These are the current two stables API versions. (If you are wondering about the previous versions, you can get some information in GitHub documentation.

The GitHub REST API v3 covers “all” (or almost, I am not sure) areas. However, tons of API calls need to be made in order to satisfy a workflow job. Therefore, GitHub decided to replace REST API with GraphQL in the version 4 and this is why:

GitHub chose GraphQL for our API v4 because it offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints. GraphQL lets you replace multiple REST requests with a single call to fetch the data you specify.

GitHub documentation, https://developer.github.com/v4

However, the new API (GraphQL API v4) did not solve the problem 100%. It still has some gaps and it does not cover all areas. You may need to go back to the version 3 to satisfy a given job.

Disclaimer: I was novice to GitHub API and GraphQL when I started working in this project.

I was working in a project that integrates GitHub in one of its processes. I chose the GraphQL API v4. Everything was going smooth until I had to get the content of the last commit.  Google helped a bit sometimes and github.community forums too (not well referenced in Google). I looked for a way to get the content of a given commit with GraphQL, I thought that it was an obvious thing; I managed to create this graph with the provided documentation.

If you are novice to GraphQL, you can start your learning journey here graphql.org/learn and I recommend the query editor GraphiQL : the GitHub repository github.com/skevy/graphiql-app / download page electronjs.org/apps/graphiql

{
   rateLimit{
    cost
    remaining
  }
  repository(name: "GitHubAPIDemo", owner: "MohamedSahbi") {
    ref(qualifiedName: "master") {
      name
      id
      target {
        ... on Commit {
          id
          history(first: 1) {
            pageInfo {
              hasNextPage
            }
            totalCount
            edges {
              node {
                author {
                  name
                  date
                }
                changedFiles
                commitResourcePath
                oid
                abbreviatedOid
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

fragment GetAllFiles on Tree {
  ... on Tree {
    entries {
      name
      type
      oid
      object {
        ... on Tree {
          entries {
            name
            type
            oid
            object {
              ... on Blob {
                text
              }
            }
          }
        }
      }
    }
  }
}

This graph query returns by order:

  • The cost of my request in the rate limit section. This is important because every user has a limited credit of 5000 request per hour, but a single GraphQL call can cost 1 credit, 100 credits or more than 5000 credits. For more details, see the explanation provided by GitHub.
  • The most recent commit details since we choose “history (first:1)”
  • The repository content that we chose to get in the section:
                tree {
                  entries {
                    name
                    type
                    oid
                    object {
                      #This is a fragment
                      ...GetAllFiles
                    }
                  }
                }

What I found it that the commit history does not include the changed files URLs that are provided using the REST API v3. I kept looking for a way to fix my query. I was still believing that it is provided out-of-the-box, but since I am not experienced with GraphQL, I thought I made a mistake in my query.

Then, I lost hope for a while. I decided to query the GitHub repository with REST API v3. To go the content of each file I have to:

"files": [
    {
      "sha": "9907549076a9271ee4948e909eb0669d3ba4875b",
      "filename": "LICENSE",
      "status": "added",
      "additions": 21,
      "deletions": 0,
      "changes": 21,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/LICENSE",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/LICENSE?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,21 @@\n+MIT License\n+\n+Copyright (c) 2019 Mohamed Sahbi\n+\n+Permission is hereby granted, free of charge, to any person obtaining a copy\n+of...."
    },
    {
      "sha": "7b9e8fe3adf9f784749834da35fecda8a5392bd3",
      "filename": "README.md",
      "status": "added",
      "additions": 2,
      "deletions": 0,
      "changes": 2,
      "blob_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/blob/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "raw_url": "https://github.com/MohamedSahbi/GitHubAPIDemo/raw/5f4538bce768c67bcfd3e71cb05a14614657f68f/README.md",
      "contents_url": "https://api.github.com/repos/MohamedSahbi/GitHubAPIDemo/contents/README.md?ref=5f4538bce768c67bcfd3e71cb05a14614657f68f",
      "patch": "@@ -0,0 +1,2 @@\n+# GitHubAPIDemo\n+This repository contains a sample demo for my blog post"
    }
  ]
  • Third, we loop over the files array and each time we have to:
    • Call the endpoint that is giving in the attribute contents_url
    • The previous call returns a JSON object that contains the attribute download_url, which is the last API endpoint that we have to call to get the file content (Finally!!!).

Imagine you want to get the updated files in the last commit without knowing the commit hash, that is:

2 API calls + (2 * number of updated files) API calls >= 4 API calls

The duplicate data, the useless extra information in the JSON responses, and the huge number of endpoint calls to get files content pushed me to keep looking further for a better solution. I kept looking until I found this post about the same problem that I am facing. It convinced me that I have to fix it myself since there is no out-of-the-box solution.

Suddenly, I found this great post about the GraphQL aliases. I should have paid more attention when learning GraphQL or spend some extra hours learning. And yeah I do not how I end up finding about aliases when I was solving the problem.

I guess you know where I am going here. Alias is the best way to avoid the 2*number of updated files API calls that I had to do using the REST API v3.  The solution is to profit from the best features of both GitHub API v3 and v4.

Solution

The final process looks like this:

  1. Get the commits using REST API v3
  2. Get the commit content also using REST API v3
  3. Generate a single GraphQL query with aliases to get the files content.

I created a sample project in GitHub with a GitHub Service that you can reuse easily. The are 2 main methods offered by this service:

  • GetLastCommitFilesContent(string directory, DateTime? startingFrom = null) : it gets the last commit, look for its content and then generate a graphQL query and retrieve the files by calling GraphQL API. It returns an object of type GitHubServiceOutput.
  • GetCommitFilesContent(string commitHash, string directory) : a similar method to the previous one, but it takes commit hash as parameter. The output is like the previous method’s output.

You will find the details about the code sample and the used libraries in the GitHub repository.

Custom Object Comparison in C#

A while ago, I needed to compare complex objects in C#. I forget to mention the real reason when I wrote the article and thanks to the feedback that I got, here is the main reason: Not only a Boolean result is needed from the comparison but also I need to have as output the properties with different values and the possibility to exclude properties from the comparison, . I looked for such function that would provide the same functionality of string.Compare() but for complex objects of the same type. Object.Equals() Method alone did not satisfy my need and it needed an implementation and overriding of the method which was not convenient. My next stop was stackoverflow and I found quite interesting approach in this discussion. That did not satisfy my need to the fullest so I took it as a start point and I had to implement my own comparison class (the implementation is available on GitHub with a running sample).

I created a static class Comparer with a constraint on the generic parameter to a class which satisfies my need. If you do not know what do constraints on type parameters mean, go to Microsoft docs (I recommend reading it and understanding what it is and why it is used because you will need it for sure). Then, reflection was the choice to get things done. PropertyInfo class which belongs to System.Reflection namespace was enough to do the work. This class allows us to get the different attributes of a property of an object and its metadata which I use to compare the properties of the 2 objects.

The created class offers different methods that you may find helpful :

GenerateAuditLog() method literally generates log. It returns an object of type ComparisonResult which can be inserted into logHistory table in your database. This method is overloaded so that you can exclude some properties from the comparison.

GenerateAuditLogMessages() method returns a list of messages that contains only changes. There is no overload for this method.

HasChanged() method simply returns Boolean result. You can eventually exclude some properties from the comparison. I found this method useful for updating records in the database.

That is all!! I hope you find it useful. Feel free to use the code or improve it.

Note: The code is not optimized (no DRY approach) because I take in consideration people who wants to use one method so they can copy the code simply (I personally recopy my code, improve it and adopt it to the case that I have).

Raw queries with Entity Framework Core

tl;dr

In this blog post, I showcase how to migrate raw SQL query from Entity Framework 6 to EF Core 2.1 and EF Core 3.1. You can find the whole sample in GitHub.

— — — —

I have been working lately on project migration form ASP.NET MVC 5 that is using Entity Framework 6 to ASP.NET Core 2.1 with Entity Framework Core 2.1. During the work, I found a raw query implemented in Entity Framework 6 as following (not really): 

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var studentIdParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Database
        .SqlQuery<int>(query, studentIdParam).ToListAsync();

    if (gradeList.Count == 0)
    {
        return 0;
    }

    return gradeList.Average();
}

Meanwhile, it is not possible to do so in Entity Framework core. I have to look for solutions and I found 2 of them:

The first solution is a simple implementation with ADO.NET, you can find it in my github account: method GetScoreAdoNet(int studentId). However, I try to avoid ADO.NET because of internal rules in our team and mainly for maintenance reasons.

So, I kept looking for another solution using Entity Framework Core. Thanks to the great community in stackoverflow, I found this answer for my problem. Here is the second and better solution : 

Solution for EF Core 2.1

I will be using Query types proposed by Entity Framework Core. 

First, we have to create a data model that will be used as return type of the executed SQL query. Although in my sample (here), I just return a number (int), I have to create a model that has one property. The name of the property should be the same name of the column selected in the SELECT Statement.

 public class AverageGrade
    {
        public int Grade { get; set; }
    }

Then, we need to configure it in the dbcontext in the method OnModelCreating 

modelBuilder.Query<AverageGrade>();

And finally we can call the raw SQL query:

public async Task<double> GetScore(int studentId)
{
    string query = @"select ((e.Grade * c.Credits)/sum(c.Credits)) as Grade
                                        from Enrollment e
                                        inner join Course c
                                        on e.CourseId = c.CourseId
                                        where studentId= @studentId
                                        group by e.Grade, c.Credits";

    var idParam = new SqlParameter("@studentId", studentId);

    var gradeList = await _universityContext.Query<AverageGrade>()
        .FromSql(query, idParam).ToListAsync();

    return gradeList.Select(x => x.Grade).ToList().Average();
}

Solution for EF Core 3.1

Starting from EF Core 3.0, the proposed solution for EF Core 2.1 is obsolete. It is part of many changes in EF Core 3.0 that you can find here.

A data model is needed for the output of the executed SQL query, same like in EF Core 2.1 solution.

public class AverageGrade
    {
        public int Grade { get; set; }
    }

The next step is adding the data model to the ModelBuilder in the method OnModelCreating :

modelBuilder.Entity<AverageGrade>().HasNoKey();

The last step is to use DbContext.Set<>() instead of DbContext.Query<>() in the method GetScore(int studentId). In other words, the line number 12 (in the last code block in the Solution for EF Core 2.1) is replaced with this line of code:

var gradeList = await _universityContext.Set<AverageGrade>().FromSqlRaw(query, idParam).ToListAsync();

— — — —

That’s it, migration was done successfully.

The Query Types have some limitations. In case you will be using them, please read the official documentation.

Finally, I prepared 3 projects, one with Entity Framework 6, the second with Entity Framework Core 2.1 (the first 2 projects use ASP.NET Core 2.1), and the third one with EF Core3.1. You can find the code source in GitHub.

Data preparation (part 2)

In the previous post, we went through the pre-preparation phase,  collecting meta-data, data profiling and data preparation rules. This post is mainly about dealing with missing values aka Nulls.

Before looking for methods to deal with nulls, confirm that you are really missing some data. It is possible to have some blanks in the data set that can be replaced with a value. Non quality data may have null in place of no. Let’s say a column that only has “yes” and “null” values. You should verify if the system/application that generates the data doesn’t assign any value when it is negative/false response. In that case, you only replace null with no and don’t delete the column.

In addition, meta-data can help with missing data by mentioning the out-of-range entries with types: unknown, unrecorded, irrelevant, and that can be for different reasons such as

  • Malfunctioning equipment
  • Changes in database design
  • Collation of different datasets
  • Measurement not possible

Missing data types

First, we need to understand the different types of missing data. There are 3 different types:

Missing completely at Random (MCAR)

The type title explains itself. The data are missing for random reasons (example: measurement sensor ran out of battery) and unrelated to any other measured variable. It just happened randomly.

Example:

We conducted a survey at University Campus about extracurricular activities, one of the questions is the student’s age. The survey was available online and we had some volunteers who asked students in the campus directly. After we collected the data, we started preparing it. We found out that some students did not mention their age because it was not mandatory. In this case, the age missing values are missing completely at random.

Missing at Random (MAR)

The missing data are independent on all the unobserved values, but it is dependent on the observed values. “What?!” Let’s make it simple:

We have a dataset of cars characteristics

BrandModelNbr of DoorsNbr of SeatsAirbag
AudiA655 
Mercedes BenzE6355Yes
AudiA455 
BMWM332Yes
RenaultMegan55No
SkodaSuperb55Yes
Mercedes BenzS56055Yes
Peugeot50855No
SkodaOctavia RS55Yes
TeslaModel S55Yes
AudiA855 
TeslaModel 355Yes

We have missing values in the airbag column. You notice that the missing values are dependent on the column Brand. If we group the data by the brand value, we find out that all the missing values have as brand value “Audi”.

Missing not at Random (MNAR)

The missing data are not only dependent on the observed data, but also dependent on the unobserved data.

Example:

A survey was conducted about mental disorder treatment worldwide. The results showed that respondents from low/lower-income countries are significantly less likely to report treatment than high-income country respondents.

— — — —

Dealing with missing data

How to deal with the missing data? There are 3 different possibilities:

1 – Data Deletion

First, this method should be used only with MCAR case. There are 2 different deletion methods that most of data analysts/scientists are using:

Drop them (Listwise deletion):

Basically you have to remove the entire row if it has at least one missing value. This method is recommended if your data set is large enough so that the dropped data does not affect the analysis. Most of the labs or companies have a minimum percentage of data that is required and if that threshold is not attainable, they remove the rows with missing data. Personally, if most (more than 50%) values of a variable are null or missing, I “usually” drop the column.

Pairwise deletion:

Data will not be dropped in this case. If the analysis needs all the columns, you select only the rows without any missing values. Meanwhile, if the analysis task needs some variables (not all of them) and it happens that the rows with missing values have the required values for this task, you add them to the selected data for the task resolution.

Example:

For this example, the CAR data set will be used. *Let’s assume it has 50 rows and there are missing data only in rows number 1 and 6

BrandModelNbr of DoorsNbr of SeatsAirbag
1AudiA655 
2Mercedes BenzE6355Yes
3BMWM332Yes
4SkodaSuperb55Yes
5Mercedes BenzS56055Yes
6Peugeot50855No
7SkodaOctavia RS55Yes
.
.
.
50TeslaModel 355Yes

1st Task: Association rules task to find association hypothesis between number of seats and number of doors. The needed attributes are: Brand, Model, Nbr of Seats and Nbr of doors. In this case, we can use all the data set because there are no missing values for the given models.


2nd Task: Association rules task to find association hypothesis between number of seats and number of airbags. The needed attributes are: Brand, Model, Nbr of Seats and Nbr of airbags. To resolve the task, we eliminate rows number 1 and 6 and we use the rest.

2 – Replace missing values

The third option to deal with missing values is to replace them. Here it gets a bit complicated because there are different ways to achieve it.

Mean/median substitution

Replace missing values with the mean or median value. We use this method when the missing values are numerical type and the missing values represent less than 30%.

However, with missing values that are not strictly random, especially in the presence of a great inequality in the number of missing values for the different variables, the mean substitution method may lead to inconsistent bias .

Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–406. doi:10.4097/kjae.2013.64.5.402

Common value imputation

We use the most common value to replace the missing values. For example, we have a column color in the Car dataset that we used previously which has 100 records. The color column has 5 values only, the most common value (67x) is Black. So, we replace the missing values with Black. However, this method may lead also to inconsistent bias.

Regression imputation

Regression imputation let us avoid biases in the analysis. We know that Mean/Median method replaces the missing values with current ones. Instead of doing that, we predict the missing values using the available data. This way, we gain new values and retain the cases with missing values.

Multiple imputation

Multiple imputation “approach begin with a prediction of the missing data using the existing data from other variables [15]. The missing values are then replaced with the predicted values, and a full data set called the imputed data set is created. This process iterates the repeatability and makes multiple imputed data sets (hence the term “multiple imputation”). Each multiple imputed data set produced is then analyzed using the standard statistical analysis procedures for complete data, and gives multiple analysis results. Subsequently, by combining these analysis results, a single overall analysis result is produced. “

Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–406. doi:10.4097/kjae.2013.64.5.402

The purpose of multiple imputation is to have a statistically valid inference and not to find the true missing data, because there is no way to predict the missing data and get it 100% right. The main advantage of this method is the elimination of biases and it is easy to use. Meanwhile, to get a correct imputation model, you need to take in consideration the conditions needed for this method and avoid some pitfalls.

In case you want to use multiple imputation method, I recommend reading the following articles : Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls (BMJ 2009;338:b2393) and When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts (DOI: 10.1186/s12874-017-0442-1)

3 – Create new field / variable

Missing data have its own usefulness mainly when it is not MCAR (Missing Completely At Random). Therefore, we create a new variable or field that records the witnessed behavior or pattern of the missing values. This can be also useful if you own the tool that generates the data, you can create a new engineered feature based on the missing data pattern.

— — — —

Further reading

  1. How to Handle Missing Data
  2. The prevention and handling of the missing data

References

  1. ibm.com: Pairwise vs. Listwise deletion: What are they and when should I use them? , Accessed 27/02/2019 (https://www-01.ibm.com/support/docview.wss?uid=swg21475199)
  2. ncbi.nlm.nih.gov: The prevention and handling of the missing data, Accessed 21/04/2019 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100)
  3. measuringu.com: 7 ways to handle missing data, Accessed 15/04/2019 (https://measuringu.com/handle-missing-data)

Data preparation (part 1)

Data preparation is the most time consuming phase in any data related cycle whether you are preparing the data for machine learning model or data mining or BI.

I will explain how to prepare the data efficiently by following different steps.

Many people who are starting their career in the data field forget about an important step. They ask for the data and start preparing it straight away.

But before that, you should do some pre-preparation.

Business Understanding (pre-prepration)

First, you need to understand the business logic. Every data analysis task is related to business task.

Ask for explanation and read the available documentation. In addition, meetings with a business analyst in that organization or service/product owners may be required. You gained a lot of time with this step (you would find out that some data are missing afterwards if you skip it or the data structure does not make sense and many random problems)

Tip: when collecting the data, ask for the data governance department (in case there is one). The people there have useful and priceless information.

*Don’t let them convince you that the data is self explanatory.

Business understanding does not have a simple method to use. You just need to figure out how the business works and most importantly how the data was generated. After finishing this step, ask for the
needed data to the given task.

Now, we can start the data preparation. To do so we need the metadata.

Collect the metadata

Metadata is the data that describes the data. Having the metadata is a must, if it not accessible you should create it with the help of the data owner.

Metadata helps with identifying the attributes of the data-set, the type of each attribute and sometimes even the values assigned for a concrete attribute.

Data profiling

Data profiling is important to better understand the data. It “is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data”.  Data profiling includes structure discovery, content discovery and relationship discovery. This step makes it easier to discover and choose the needed data. Also, if similar data are needed for next iterations, you know already how to deal with it and the whole data preparation process becomes more easier.

Define data preparation rules (optional)

This step applies for big data. Data preparation rules are the methods of cleansing and transforming the data.

Why? Cleaning big data is not a simple task and it’s time consuming. Imagine you delete rows using the value of an attribute as condition, than you find out that the condition is missing something and the size of your data-set is 5TB. That will take you forever to figure out the right condition.

How? We use a random sample from our data-set, we cleanse it and transform it. The script that was used to prepare the
random data sample will be used for the whole data-set.

The random sample must be valid. I will write a blog post about generating a correct and valid random sample.

Iterative preparation

Start with the basic cleansing steps that apply for any dataset. After that you tackle the challenging steps such as dealing with missing data. Let the data transformation to the end.

In the part 2, we will understand how to deal with missing values and how to get better quality data.