Showing posts with label Azure Security. Show all posts
Showing posts with label Azure Security. Show all posts

Tuesday, January 16, 2024

Azure @ Enterprise - Postman to simulate api to api on behalf of flow

Often we encounter situation where we have one web api that needs to call another api as the incoming user. Basically our web api needs to impersonate the user to the next downstream api. Below goes the official diagram¹ from Microsoft.

Let us call the Web API A as middle tier API and Web API B as down stream api. Here we assume our application is an Angular web app.

Coding that auth flow is little complex. Before coding it is always good to understand how this can be done using Postman.

This require setting up Azure app registrations which are tricky to many developers. So we will detail those out. A glance of sample setup is below

  • Application
    • App registration (aad-testapp/b34ade37-*)
      • This app registration should have Postman's redirect url "https://oauth.pstmn.io/v1/callback"
      • This needs to have secret to be used from Postman
      • This should have permission to call aad-testapi1 with admin consent to avoid runtime individual user consent.
  • Middle tier web api (Web API A)
    • App registration (aad-testapi1/b6d5852b-*)
      • This should have secret to be used while asking for ob behalf of token
      • This should have permission to call aad-testapi2 with admin consernt to avoid runtime user consent.
  • Downstream api (Web API B)
    • App registration - (aad-testapi2/67d35db8-*)
      • No need to have certificates or secrets as this is not getting any token.
      • The expose and API section to have the aad-testapi1 to authorize and to avoid any consent
Once we have the above mental model, it would be easy to understand the relations. Below goes the screenshots for the above.

App registration for application (aad-testapp)




App registration for Middle tier api(aad-webapi1)

App registration for down stream api(aad-webapi2)


Now let us get into business of getting tokens

Getting First token A

This token is to talk between the client app and the middle tier API. This cane be done as normal way² in Postman.
We can use any url as we are not really making call to this web api. This token is used to obtain the ob behalf of token so that we can call the downstream api.

Set the Authorization to OAuth 2.0.
Now in the configure new token give the details as follows.

Here below 3 fields may attract queries. 

Let us details it, if the screenshot is not enough.

  • Auth Url - The url that ends with /authorize.
  • Access Token Url - The url that ends with /token.

These 2 urls can be seen if we take fiddler traces when the token is requested from Postman.

  • Scope - the scope to the aad-webapi1.
Once the values are set, click on the 'Get New Access Token' button. This will open a browser and ask for login. Once it is success it will redirect to the https://oauth.pstmn.io/v1/callback which will show a popup in browser to open Postman again. If all that is success Postman will show the new token.

This token we will be using as assertion in next step.

Getting on behalf of token B

Now we simulated the Angular client getting the token and it will be sending that in Authorization header. Let us simulate how the middle tier API going to get the on behalf of token using this token.

The url we are using is the normal token url of the tenant.
  • grant_type - no idea why it needs to be this string. It is as per the docs³. We normally expect this to be 'on_behalf_of', but it is not.
  • requested_token_use - here comes the word 'on_behalf_of'
  • client_id - is the id of middle tier app registration
  • client_secret - created in middle tier app registration
  • scope - poining to downstream api
  • assertion - this is the incoming token to middle tier api from the client application.
When we are coding in .Net we can use X509 certificate instead of client_secret
Click 'Send' button to get new token. This token can be used to call the downstream api.

Reference


Tuesday, December 26, 2023

Azure @ Enterprise - Entra - On behalf of flow - AADSTS50013: Assertion failed signature validation

Recently we were trying to achieve the below authentication scenario and we stuck with an exception as below. This post is about debugging the same issue.

"Exception MsalUiRequierdException with below message

"OnBehalfOfCredential authentication failed: AADSTS50013: Assertion failed signature validation. [Reason - Key was found, but use of the key to verify the signature failed., Thumbprint of key used by client: 'E41DE<cert thumbprint>D91', Found key 'Start=12/05/2023 17:16:57, End=12/05/2028 17:16:57', Please visit the Azure Portal, Graph Explorer or directly use MS Graph to see configured keys for app Id '00000000-0000-0000-0000-000000000000'. Review the documentation at https://docs.microsoft.com/en-us/graph/deployments to determine the corresponding service endpoint and https://docs.microsoft.com/en-us/graph/api/application-get?view=graph-rest-1.0&tabs=http to build a query request URL, such as 'https://graph.microsoft.com/beta/applications/00000000-0000-0000-0000-000000000000']. Trace ID: <guid> Correlation ID: <guid> Timestamp: 2023-12-22 16:15:05Z"

Below is the call stack

Azure.Identity.AuthenticationFailedException:
at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow (Azure.Identity, Version=1.10.4.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
at Azure.Identity.OnBehalfOfCredential+d__19.MoveNext (Azure.Identity, Version=1.10.4.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at MyAuthApi1.Controllers.WeatherForecastController+d__11.MoveNext (MyAuthApi1, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: /home/runner/work/aad-auth-test/aad-auth-test/MyAuthApi1/Controllers/WeatherForecastController.cs:137)

Inner exception Microsoft.Identity.Client.MsalUiRequiredException handled 

at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow:
at Microsoft.Identity.Client.Internal.Requests.RequestBase+d__31.MoveNext (Microsoft.Identity.Client, Version=4.57.0.0, Culture=neutral, PublicKeyToken=0a613f4dd989e8ae)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Identity.Client.Internal.Requests.OnBehalfOfRequest+d__3.MoveNext (Microsoft.Identity.Client, Version=4.57.0.0, Culture=neutral, PublicKeyToken=0a613f4dd989e8ae)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Identity.Client.Internal.Requests.RequestBase+d__12.MoveNext (Microsoft.Identity.Client, Version=4.57.0.0, Culture=neutral, PublicKeyToken=0a613f4dd989e8ae)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Microsoft.Identity.Client.ApiConfig.Executors.ConfidentialClientExecutor+d__4.MoveNext (Microsoft.Identity.Client, Version=4.57.0.0, Culture=neutral, PublicKeyToken=0a613f4dd989e8ae)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1+ConfiguredTaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Azure.Identity.AbstractAcquireTokenParameterBuilderExtensions+d__0`1.MoveNext (Azure.Identity, Version=1.10.4.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Azure.Identity.MsalConfidentialClient+d__27.MoveNext (Azure.Identity, Version=1.10.4.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Azure.Identity.MsalConfidentialClient+d__26.MoveNext (Azure.Identity, Version=1.10.4.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Threading.Tasks.ValueTask`1.get_Result (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Azure.Identity.OnBehalfOfCredential+d__19.MoveNext (Azure.Identity, Version=1.10.4.0, Culture=neutral, PublicKeyToken=92742159e12e44c8)

The message emitted has not much relation with the actual root cause. Debugging steps are later converted to the scientific debugging¹template²for easier understanding. Below is the embedding of the same. Please click here in case it is not loading properly.

Happy debugging...

References



Tuesday, November 28, 2023

Azure @ Enterprise - Generate JWT using Service Principal +Certificate via MSAL.PS

One of the previous posts in this blog was to get JWT using the Service principal + certificate combination¹. The Az.Acount PowerShell module was used to achieve the task. That approach signs into Azure as the service principal and generates JWT. This post is to get the JWT without logging in.

MSAL.PS

Microsoft Authentication Library is to interact with the Microsoft security system recently renamed Microsoft Entra². 

There are also client-side libraries to interact with it from different languages. MSAL.PS is the library to do the same from PowerShell. Though the MSAL.PS³ is superseded by Azure Az PowerShell SDK⁴, it is still worth giving a try.

The below code shows how we can get the JWT using MSAL.PS.
The code is mostly straightforward but requires some basic understanding of the Azure security model, scope etc.

Making HTTP resource calls

Once we have the JWT, it can be used to invoke HTTP calls or to execute a SQL Query.
$token = Get-MsalToken @creds
 
$reqHeaders = @{
    'Authorization' = $token.CreateAuthorizationHeader()
}

$requestUrl = "<YOUR RESOURCE URL>"
Invoke-RestMethod -Uri $requestUrl -Headers $reqHeaders

The interesting thing is that the post published 2 years ago uses the up-to-date and officially supported method.
Happy working with legacy codebase.

References

¹ - https://joymonscode.blogspot.com/2021/09/azure-enterprise-powershell-log-in-as.html

² - https://learn.microsoft.com/en-us/entra/fundamentals/new-name | https://devblogs.microsoft.com/identity/aad-rebrand/

³ - https://github.com/AzureAD/MSAL.PS

⁴ - https://learn.microsoft.com/en-us/powershell/azure/new-azureps-module-az

Tuesday, September 26, 2023

SharePoint Online - PnP.Core sample to check CopyJobProgress via Azure Storage Queue SDK for .Net

This is the continuation of the last post in this blog related to checking the status of CopyJobs in SharePoint Online. The last post has code snippets using PnP.Framework. It is not a complete compilable and working sample. This post is to introduce a functional example. The only difference is that this sample is using PnP.Core SDK.

If anyone wondering what is PnP.Framework and PnP.Core, please have a look at the dilemma of choosing .Net SDK to interact with SharePoint Online.

Sample located at https://github.com/dotnet-demos/sharepoint-createcopyjob-azure-queue-tracking.

Points of interest

Though the code has a lot of comments, below are the areas that may bring some questions.

Getting the job status

There is no way to get the Copy Job status in PnP.Core SDK. Even there is no plan to implement that feature. Feature request #1277 was closed.

So either we need to switch to PnP.Framework SDK or make a direct HTTP call. In the sample, we can see it is making a direct HTTP call.

Determining the JSON request body for direct HTTP calls

If we refer to any documentation related to SharePoint migration API, it is referring to the SDK which is nothing but .Net classes and methods. No HTTP request-response level documentation is available. The PnP.Framework SDK uses XML to craft the request payload and the response is JSON. That request XML is the hardest thing in the world to understand. Luckily the PnP.Core SDK uses JSON for request and response. Though there is a tutorial on basic operations to SharePoint REST endpoints, it is difficult to craft the JSON request to do a simple operation such as GetMigrationJobStatus(guide jobId).

Raw HTTP call to GetMigrationJobStatus(guide jobId).

The official documentation is there in 2 places. But in both places, it is talking about .Net SDK usage, not the HTTP-level request response.

First, we have to decide what should be the HTTP method. Common sense tells us this should be GET as its read operation. Url should be something like _api/site/GetMigrationJobStatus/{JobId}
We are totally wrong. It should be HTTP POST.

Now how should we build the POST body to include the jobId? Just the id in the POST body or we should have an object as below?
{
"jobId":"{jobId}"
}

Nothing is gonna work except the below one.

{
"id":"{jobId}"
}

Refer to the GetJobStatus() in the example.

Have fun with SharePoint. Also please give PR or create an issue in case of any improvements to be made.

Tuesday, June 27, 2023

Azure @ Enterprise - Upload certificate to App registration using PowerShell

Warning

All characters and events in this post are fictional. Any resemblance or similarity to any actual events entities or persons is entirely coincidental

Developer: Hi there, as you know we are excited to release the next version that uses Azure AD-based authentication. All the 100 development, testing till prod deployment instances require their own Azure app registrations. Also, require certificates added for authentication to outside services. Certificates can be shared across lower environments.

Install Manager: Oh. It is not possible to have one app registration per application instance. It is not manageable.

Developer: What? Are you not paid for managing instances? Hope you are not doing charity.

Install Manager: Yeah I know you are funny. We cannot manage that many app registrations and certificates.

Developer: What do you mean by managing? 

Install Manager: Oh boy, higher environments are not like your toy dev instances. It is serious business. Do you know the certificates are not forever? On that day they expire, we need to upload a new certificate to all your dev and test app registrations.

Developer: Hey, we are in 2023. Don't you guys automate?

Install Manager: We don't do automation here.

Developer: It is not huge code. Just use the simple PowerShell commands that any system admin knows.

Install Manager: Enough boy. We don't have skilled people here. If you want, you code it and give it to us.

Developer: Oh ok..here you go.

Tuesday, April 18, 2023

Azure @ Enterprise - Azure app registration service principal to access selected SharePoint sites

Disclaimer - SharePoint Online as a "CMS to organize semi-structured content and documents via the user interface" is a promising technology. If you are integrating with an enterprise application, using its API and SDKs, or customizing, wishing you all the best.

Accessing the SharePoint Online Site Collection

Not so long ago below was the limitation with SharePoint

  • We cannot give permission for specific SharePoint site collections to an Azure app registration service principal.
  • If we use the service principal to access the site collections, it will get permission for all site collections in the tenant. In enterprise, other site collections may belong to totally separate divisions or applications.
  • If an enterprise wants to limit the applications to specific sites, they have to use any of the below methods
    • Service accounts - This means the application needs to keep the service account name and password. Use the ROPC flow³ which is not recommended by Microsoft itself. 
    • Separate tenant per application - Provision separate tenant per application, sync users, etc...
  • Service Account is viewed as a user from SharePoint Online and throttled more heavily than a service principal that is treated as an application. Have fun fighting production issues or retrying operations.

2021-02-11 - App access on a specific SharePoint Site Collection (Graph Only)

Microsoft announced a feature¹ that will allow applications via service principal permission to specific sites. As follows
  • In the Azure portal, we can give Sites.Selected permission to a service principal
  • Use the below API endpoint to give permission to the application ie service principal
    • https://graph.microsoft.com/v1.0/sites/{siteId}/permissions
    • This allows granting only Read, Write. Cannot manage sites with Full control
  • In order to give this permission the granting application or user identity needs to be in higher privilege.

Somehow Microsoft forgot the fact that the Graph API is not complete. There are many scenarios Graph API is not supporting and we have to fall back to CSOM/Legacy SharePoint APIs. Maybe some application teams might have leveraged this feature that was not even in GA status. But if it's a real SharePoint integration project they might not be. Anyway this is a good start

2022-08-11 - App access on a specific SharePoint Site Collection (GA and CSOM support)

Below are the updates² to the feature announced after one and half years.
  • Sites.Selected permission is applicable to CSOM API as well
  • The feature became generally available.
  • The site permissions API still allows granting only Read, Write. Cannot manage with Full control.
  • The official documentation is yet to be updated.
It looks like the below in the Azure portal after giving permission to both Graph and SharePoint (CSOM) APIs

Now below are the benefits to an enterprise with this change.
  • Permission sites to individual applications using the service principal. 
  • Applications, specifically daemon applications need not use a service account that has no throttling documentation and is heavily throttled.
  • Applications don't need to use ROPC flow where the password is involved and discouraged by Microsoft itself.

Sites.Selected with Full Control

One blogger figured out a hack to grant full access to service principals with Sites.Selected as follows.
  • Create write permission first. Full Control at the time of creation is not working.
  • Change that permission to Full Control.

Updates

2022-06-09 - Conditional access in GA

Some enterprises want to increase security posture by limiting the usage by allowing from limited IPs. Service Accounts as that capability earlier, the app registrations got that capability later. It first came in preview and as of 2023-06-09, it became production-ready .

References

Tuesday, January 17, 2023

Azure @ Enterprise - Connection pooling behavior when JWT is used to authenticate into Azure SQL Server

Connection pooling in .Net is a hidden feature. We as consumer developers, cannot directly feel how the pooling happens behind the scenes. .Net is allowing us to clear the connection pool. We create a new SqlConnection and Close it after use. When we create a connection it uses an internal pooling mechanism and a connection object is associated with the object that we create and the closed connections go back to the pool.

This post is not to educate about simple connection pooling instead it talks about an issue encountered in production.

Issue

There is a .Net 6 Web API hosted in Azure App Service. It mainly accesses an Azure SQL database and storage blobs. When the load slightly increased, it started showing the below error.

System.Data.SqlClient.SqlException (0x80131904): Resource ID : 2. The session limit for the database is 600 and has been reached. See 'https://docs.microsoft.com/azure/azure-sql/database/resource-limits-logical-server' for assistance.

at System.Data.ProviderBase.DbConnectionPool.CheckPoolBlockingPeriod(Exception e)
 at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
  at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
  at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()

Troubleshooting

Let us start the production war to bring it back.

Q&A about connection pooling

Before starting specific troubleshooting let's refresh the main points about connection pooling. Experts may skip this section

Are there many connection pools?

Yes, there are chances of having many connection pools. For each connection string, it creates and maintains a separate connection pool.

How many connections are in a pool?

By default 100 per pool. It is configurable through the connection string or via code

When all the connections are in use and a connection is requested from the pool what will happen?

It will throw an exception after the connection timeout. Looks like 

"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. 
This may have occurred because all pooled connections were in use and max pool size was reached."

Does the SQL Server know the consumers are keeping connections in the pool?

Yes. To simply verify just run exec sp_who2 in the SQL server to check the connections from the client machine. Most of the connections will be in the Sleeping state.

How to check the pooling status of our application?

Take a memory dump of the application and analyze the crude way using WinDbg or use the DebugDiag tool.
Otherwise, monitor the Performance counters related to ADO.Net.

Specific steps to troubleshoot the issue

Let us look at some specifics about this issue

What are the Azure SQL specs?

Standalone database, tier - S0

What is exec sp_who returns from the database?

A lot of connections in the sleeping state.

Are there 600+ requests coming to the App Service per second?

No. There are only 10 requests per second.

Is there auto-scaling of app service instances?

No. There is only 1 app service instance. Questions such as "who in the world runs App Service with only one instance in production" are banned.

Any other application is making connections to the same database?

Yes. But that Azure Function is now disabled. 

Is the connection string or code have any customization of connection pool size?

No customization. Just working with the default 100

What is the authentication mechanism for Azure SQL?

It is the service principal / Azure app registration. Uses certificate to obtain JWT.

How many connection pools are there?

Only god knows until we take a memory dump and analyze it.

Will there be one connection pool per JWT considering JWT is part of the connection string?

Oh yeah...There are chances if the JWT is not cached. If each SqlConnection is created with a separate JWT, it will create many pools with one connection. If that happens, we will not get the timeout error from the connection pool, but the Azure SQL will run out of connections. Resulting in the same exception as mentioned above.

Is there any Microsoft official documentation for the above answer?

Not really. But a GitHub Issue answering the question.

For further clarification look at the source code of SqlConnectionPoolKey.CalculateHashCode()

Disclaimer

The actual root cause cannot be disclosed on social media such as this blog due to security, confidentiality, etc...

Tuesday, December 27, 2022

Azure @ Enterprise - DNS->PublicIP->Azure App Gateway->Pod Using App Gateway Ingress controller

This is another GitHub project introduction post. This time introduces a repo that shows how an end-to-end Azure solution can be set up from Azure DNS -> Public IP Address -> Azure App Gateway -> Azure Kubernetes Service with Application Gateway Ingress controller AGIC hereafter.

This is a standard use case of how to expose a web application from Kubernetes via standard Azure  networking services such as DNS, Public IP Address and AAG

Problem

When we want to route traffic to AKS, the primary question is how to set up the ingress. Traditionally we use NGINX ingress controller and it was not a question at all. Now with Azure Application Gateway Ingress Controller, we just need to justify the decision.

What is Application Gateway Ingress Controller?

It is one of the Kubernetes Ingress controllers that can only work with Azure Application Gateway. The major differences with NGINX are
  • AGIC will monitor the cluster and update the Azure Application Gateway back with routing IPs.
  • AGIC uses Pod IPs as the backend instead of the service IP address. If there are 3 pods there will be 3 IPs in the Application Gateway backend pool
  • In order to update the Azure App Gateway, the AGIC should have Azure RBAC permissions via user-assigned managed identity. 

Photo credits go to the official docs.

My recommendation is not to use this AGIC as it's having issues.

There will be a separate post on issues with AGIC. 

Setting up

Below is the repo that has biceps and required PowerShell to set up end to end. 


Please note not all things are automated. Readme.md has the details.

References

https://github.com/paolosalvatori/aks-multi-tenant-agic

https://raw.githubusercontent.com/Azure/application-gateway-kubernetes-ingress/master/docs/examples/aspnetapp.yaml

https://github.com/Azure/application-gateway-kubernetes-ingress/blob/master/docs/setup/install-new.md

Tuesday, December 20, 2022

Azure @ Enterprise - Networking at pod level inside AKS using Calico

This post is to introduce networking inside the Kubernetes cluster at the pod level. There may be questions about why as application developers we need to worry about networking inside the Kubernetes cluster? 

It depends on how the Kubernetes cluster is planned in the organization. If we are lucky our project may be getting its own Kubernetes cluster. But often we ended up as a tenant inside another Kubernetes cluster. Our application may need to access a public API URL, if we set the networking rules at the cluster level, all the applications inside the cluster will get access. Organizations generally prefer something other than that. Also, we don't want some other application in the cluster accessing our database, Redis cache, etc... even if they have the credentials.
Hence we will end up dealing with the network policies at the pod level

Theory

Kubernetes defines networking policies as resource natively. But it doesn't have a built-in mechanism to enforce the networking rules.

For that, we need to use one of the network plugins. For more theory please read the docs. Now let us come to some practicals

Azure NPM v/s Calico

Since we are going to use Azure Kubernetes Service, we can focus on 2 plugins. Those are kinda natively supported by AKS.  One is Azure NPM (Network Policy Manager) and the other is Calico.

Though we may tend to think the Azure NPM is fully implemented in AKS as both start with Azure, the reality is otherwise. As of writing this post, the calico has better support including Windows Server 2019 in AKS. Some of the features of Azure NPM are still in preview. See the comparison of Azure NPM v/s Calico for more details.

As the title of the post shows we are going to use Calico.

Installation of Calico on AKS

As per calico documentation, there is a way to install calico on an existing AKS cluster. But when I followed it was not working efficiently.
So better enable Calico when we create the AKS cluster.

How the network policy looks like

The networking feels alien to at least some developers. It requires knowledge about IP addresses and notations, complex network setup screens or commands, etc. But don't worry the networking can be set up via YAML files.

If YAML feels alien to a developer nowadays he should rethink being a developer.

Sample

Enough theory. Please clone the below repo and check out how the networking policies can be applied.
Step-by-step details are not going to be added here as that sample will get more scenarios in the future.


All are welcome to give PRs or add issues to the repo.

References

Tuesday, December 6, 2022

Azure @ Enterprise - AGIC on AKS - Unexpected status code '403' while performing a GET on Application Gateway.

Context

When we want to route traffic from the internet to AKS through the Azure Application Gateway (AAG for short), there are 2 approaches to set up ingress in AKS. The traditional NGINX way where the flow is one way. But with  Application Gateway Ingress Contoller (AGIC for short), there is one more approach to solving the problem. AAG can route to the Pod IPs if it knows. 

  • Who will let AAG knows about the pod IPs? 
    • It is the AGIC running inside the AKS.
  • How does the AGIC authenticate to modify the AAG?
    • Using user-assigned managed identity with necessary permissions

Problem

When we simply create the AKS cluster with AGIC enabled, we expect that we can pass the pre-created managed identity to the AGIC that has permission to AAG. Note that the AGIC is not installed separately, instead enabling using AKS add-on. We are in the DevOps automation world, so we normally look for a declarative mechanism to do it. Obviously, the bicep will come on our way. When we look at the office bicep reference for AKS-managed clusters details on AGIC enablement are not available as of writing this post.  But there are links at the bottom to one of the ARM examples having AGIC capability. 

It has a section located at properties->addonProfiles->ingressApplicationGateway where we can give the details of AAG and the identity.

Since the ARM and bicep are like JavaScript and TypeScript, it is relatively easy to prepare the bicep given the equivalent ARM template.

After conversion when the bicep is run below warning showed up

Warning BCP073: The property "identity" is read-only. Expressions cannot be assigned to read-only properties. If this is an inaccuracy in the documentation, please report it to the Bicep Team. [https://aka.ms/bicep-type-issues]

If someone cares about the warning, he is not a developer. So ignored it and proceeded. Poor Azure documentation which is calling for help.

But after the creation of AKS it started showing errors when accessing via AAG. The logs of AGIC ingress pod shows the below error. 

kubectl logs ingress-appgw-deployment-<a random string> -n kube-system

E1206 02:34:30.289515       1 client.go:170] Code="ErrorApplicationGatewayForbidden" Message="Unexpected status code '403' while performing a GET on Application Gateway. You can use 'az role assignment create --role Reader --scope /subscriptions/<subid>/resourceGroups/K8ClusterWithAGIC --assignee dfb5d05b-cc65-46fa-a0b8-9fe93ebfed9e; az role assignment create --role Contributor --scope /subscriptions/1cd7467f-530b-4278-94ef-de3f5da74e0c/resourceGroups/<rgName>/providers/Microsoft.Network/applicationGateways/myappgw --assignee dfb5d05b-xxxx-xxxx-xxxx-9fe93ebfed9e' to assign permissions. AGIC Identity needs atleast has 'Contributor' access to Application Gateway 'myappgw' and 'Reader' access to Application Gateway's Resource Group '<rgName>'." InnerError="network.ApplicationGatewaysClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' with object id '41ff78bf-3b92-xxxx-xxxx-180e4c7970a2' does not have authorization to perform action 'Microsoft.Network/applicationGateways/read' over scope '/subscriptions/<subid>/resourceGroups/<rgName>/providers/Microsoft.Network/applicationGateways/myappgw-' or the scope is invalid. If access was recently granted, please refresh your credentials.""

Step 1

Wonder why there is a section called solution? will come to know soon. The error is somewhat clear. But my pre-created managed identity has permissions. When we closely look we can see 2 issues 
  1. The error message shows a different client id of MI instead of what we have given.
  2. Since the client id used is different. It is newly created and obviously, it doesn't have permission to manage AAG.
The first issue because it is not using the identity that we passed to the bicep. We should have taken the warning seriously.

The solution is to give contributor access to AAG. Let us worry about the least permission later. The name of the newly created identity follows the below convention so it is easy to search.

"ingressapplicationgateway-<cluster name>"

Problem 2

Once we fix the issue, we may be thinking all good as AGIC has permission to AAG to create listeners, backend pools, and settings. But it is not enough as the below error started showing up.

E1206 02:59:09.217888       1 controller.go:141] network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' with object id '41ff78bf-xxxx-xxxx-xxxxx-180e4c7970a2' has permission to perform action 'Microsoft.Network/applicationGateways/write' on scope '/subscriptions/<subid>/resourceGroups/<rgName>/providers/Microsoft.Network/applicationGateways/myappgw'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptio957", FieldPath:""}): type: 'Warning' reason: 'FailedApplyingAppGwConfig' network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' with object id '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' has permission to perform action 'Microsoft.Network/applicationGateways/write' on scope '/subscriptions/<subid>/resourceGroups/<rgName/providers/Microsoft.Network/applicationGateways/myappgw'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptions/<subid>/resourcegroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/myappgw-ManagedIdentity' or the linked scope(s) are invalid."

Step 2

Here we can see 2 problems

  1. The error message is misleading with the same client id and object id. In fact, they are different. It seems people at Microsoft itself are confused about this AAD security mechanism.
  2. AGIC identity is looking for permissions on the identity of AAG.
God knows why the AGIC requires permission on AAG's managed identity to manage the AAG details. This issue can also be fixed by giving a contributor permission to the managed identity of AAG.

Happy debugging...

Read the docs

This is the standard answer we get when we do trial and error. As of writing this post, the permission on the identity of AAG is not mentioned in the brownfield installation.

References

  1. https://www.torivar.com/2022/09/16/aks-with-agic-existing-agw/ - Almost the same experience but with Terraform IaC

Tuesday, March 29, 2022

Azure @ Enterprise - How to cache JWT access token in MSAL.Net?

When we start using the MSAL.Net library to work with Azure Active Directory and experience its fluency and other features, we start thinking that has everything we need. If we take the simple example of obtaining a JWT access token, we naturally think that the library is caching the token based on the "exp" attribute. At least I came to multiple scenarios where developers thought, the library is caching access tokens but in reality it is not.

How to cache the JWT access tokens in MSAL.Net?

It is not easy and it is not the way we developers normally think a library will offer. 

First of all, we have to write code to get token from the cache. No, it won't return a null or empty string if the token is not present. Instead, it throws an exception. We have to catch that exception and acquire a token that the cache will intercept and update.
  • Try to get token from the cache using AcquireTokenSilent()
  • If it throws an exception try to get a token using other AcquireToken methods based on our authentication model.
This is not a workaround. It is the documented solution.

Tuesday, November 16, 2021

Azure @ Enterprise - AAD App registration - Should I need it if "Admin consent required" is "No"

Security is hard. Especially every vendor says they follow the standard of oAuth and implement differently. Let us look at the Azure App registration's API permissions blade. 


<image permissions blade>

We can see there are 3 columns that may confuse us. One is the Type and 2 at last that denote whether the permission requires admin consent and the status column showing whether admin consent is granted or not.

Further, it reveals something very interesting. The 'User.Read' permission doesn't require admin consent but it's given. Why in the world someone needs to do like this?

Let us try to unmask the mystery or just read the docs.

Delegated permission v/s application permission types

It's very simple. The delegated permission is only for users and the application permission is for applications. To make it more simple, 'Delegated' means the users can delegate an application to do something on behalf of the user. Here the application is not acting as an application instead of as a user.

Application permissions is allowing applications to perform something as if it is done by the application. eg: scheduled tasks, queued tasks in general daemon apps

Column 'Admin consent required'

This means the developer or the publisher of the permission put a requirement that this permission is tenant-wide and important. So it requires Admin consent.  Let us consider 2 scenarios where admin consent is not required.

What if admin consent is not required and it's not given and used by the user app

When an application demands permission that doesn't require admin consent and is used by an interactive user application, there will be a consent form that pops up that asks the user to give consent

In short

The user has to give his consent even if the admin consent is not required.

What if admin consent is not required and it's not given and used by the daemon app

Here the app will fail because that application is not running under user context and there is no one to give consent.
Admin consent is always required for the daemon apps to work with application permissions.

This completes the answer to the title question. Yes we need admin consent for the daemon applications even if "Admin consent required" is "No" 

What if admin consent is not required but given and used by the user app

If the admin has given the consent, there is no need for the user to give consent again as its tenant wide permission. The consent popup will not be shown.

Special cases

ROPC Flow

This refers to Resource Owner Password Credentials flow where the application can act as a user by obtaining the user's credential (password). It can obtain the credential from its own configuration file when running as a daemon application or from the user when it's running as a user-facing UI application. 

When it is running as a daemon we normally create a service account and keep its credentials in the application configuration. In both cases, admin consent is required.
Whatever the case the ROPC is not recommended by MSFT

Restrict users from consenting

Even if a publisher application doesn't require admin consent the AAD tenant admin can override that to enforce admin consent for all permissions. This leaves users with no permission to give consent.

ReadTheDocs

https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-permissions-and-consent#using-the-admin-consent-endpoint

https://docs.microsoft.com/en-us/azure/active-directory/develop/application-consent-experience#app-requires-a-permission-within-the-users-scope-of-authority

https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/configure-admin-consent-workflow

https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/grant-admin-consent

Tuesday, September 14, 2021

Azure @ Enterprise - PowerShell log in as service principal + certificate and generate JWT access token

The enterprise always loves to increase the security posture. One authentication approach enterprise takes in Azure is App registration with a service principal. The advantage here is that the service principal can use certificates to authenticate instead of passwords. Certificates are secure than passwords as those can be centrally managed.

Problem

If our application uses the service principal + certificate and it is working fine, there is no issue. But the problem starts when something goes wrong. If we log in to the portal using our credentials and try scenarios that are failing, we may see everything works fine. But things go wrong when the application logs in using service principal. It may be a permission issue, expired certificates or passwords, etc...

What if the problems appeared first in production where enterprises don't allow any changes to the environment. ie no debugging tools are allowed to install etc...

Solution

The solution is to troubleshoot the scenario as close as how the application works. We have to log in as the service principal and try the application scenarios. 

As always in enterprise the best method to troubleshoot in production environments is PowerShell. Below goes the code to log into Azure from PowerShell using service principal and generating a JWT access token.

The Connect-AzAccount cmdlet provides different ways to log in. Using the service principal is one of the methods.
Please note that the Az.Accounts need a minimum of Windows PowerShell 5.1 or PowerShell 7 version.

Once the PowerShell session is authenticated, we can perform various operations as the service principal. 

In most cases, the access token is not really required. There are PowerShell SDKs for most of the Azure services. We can directly use them. But in some cases like interacting with the data plane of Service Bus, we may need to use the access token and embed it in the Authorization header of the HTTP request.

Limitations

It reads the certificate from the personal store unless loaded from a file. We cannot pass an X509Certificate object to the Connect-AzAccount cmdlet. There is already an issue in GitHub to track it.

Update : 2021-10-30

What if we don't have permission to install Az.Accounts module?

Recently, I came to one production debugging situation where there is no permission to install the Az.Accounts module and even no connectivity to the internet to get the module. The only way is to write everything ourselves. Fortunately, someone had already done that and it is available publicly. If interested read the official docs.

Tuesday, March 2, 2021

Azure @ Enterprise - Managed Identity formerly MSI to bulk insert from Azure Data Lake Gen2 (ADLS Gen2) to Azure SQL Database

Requirement

Bulk insert a CSV format file from Azure Data lake Gen2 to Azure SQL using the system-assigned managed identity (Managed Service Identity) as the authentication mechanism.

Official link

As per the Microsoft link, it is not supported. The most secure way seems the SAS token-based database scoped credential.

BULK INSERT and BACKUP/RESTORE statements cannot use Managed Identity to access Azure storage


Verifying the above

The first step is to create a system-assigned managed identity to the Azure SQL Server. As of writing this post, it seems there is no way to do this from the Azure portal unless we use PowerShell. Below goes the step.

The command is az sql server update -g <resource group> -s <sql server name> -i

The next step is to create Azure Data lake Gen 2 account. How to create that is skipped in this post as that is already available on the internet. The next step is to give the required permissions to the SQL Server's identity to access the Azure Data lake Gen 2 account.
The permissions can be more restricted based on the use case. Now lets us see how the SQL scripts look like.


It works. 

We got an undocumented feature to work that allows Azure SQL to connect to Azure Data Lake Gen2 account using system-assigned managed identity.