Tuesday, December 6, 2022

Azure @ Enterprise - AGIC on AKS - Unexpected status code '403' while performing a GET on Application Gateway.

Context

When we want to route traffic from the internet to AKS through the Azure Application Gateway (AAG for short), there are 2 approaches to set up ingress in AKS. The traditional NGINX way where the flow is one way. But with  Application Gateway Ingress Contoller (AGIC for short), there is one more approach to solving the problem. AAG can route to the Pod IPs if it knows. 

  • Who will let AAG knows about the pod IPs? 
    • It is the AGIC running inside the AKS.
  • How does the AGIC authenticate to modify the AAG?
    • Using user-assigned managed identity with necessary permissions

Problem

When we simply create the AKS cluster with AGIC enabled, we expect that we can pass the pre-created managed identity to the AGIC that has permission to AAG. Note that the AGIC is not installed separately, instead enabling using AKS add-on. We are in the DevOps automation world, so we normally look for a declarative mechanism to do it. Obviously, the bicep will come on our way. When we look at the office bicep reference for AKS-managed clusters details on AGIC enablement are not available as of writing this post.  But there are links at the bottom to one of the ARM examples having AGIC capability. 

It has a section located at properties->addonProfiles->ingressApplicationGateway where we can give the details of AAG and the identity.

Since the ARM and bicep are like JavaScript and TypeScript, it is relatively easy to prepare the bicep given the equivalent ARM template.

After conversion when the bicep is run below warning showed up

Warning BCP073: The property "identity" is read-only. Expressions cannot be assigned to read-only properties. If this is an inaccuracy in the documentation, please report it to the Bicep Team. [https://aka.ms/bicep-type-issues]

If someone cares about the warning, he is not a developer. So ignored it and proceeded. Poor Azure documentation which is calling for help.

But after the creation of AKS it started showing errors when accessing via AAG. The logs of AGIC ingress pod shows the below error. 

kubectl logs ingress-appgw-deployment-<a random string> -n kube-system

E1206 02:34:30.289515       1 client.go:170] Code="ErrorApplicationGatewayForbidden" Message="Unexpected status code '403' while performing a GET on Application Gateway. You can use 'az role assignment create --role Reader --scope /subscriptions/<subid>/resourceGroups/K8ClusterWithAGIC --assignee dfb5d05b-cc65-46fa-a0b8-9fe93ebfed9e; az role assignment create --role Contributor --scope /subscriptions/1cd7467f-530b-4278-94ef-de3f5da74e0c/resourceGroups/<rgName>/providers/Microsoft.Network/applicationGateways/myappgw --assignee dfb5d05b-xxxx-xxxx-xxxx-9fe93ebfed9e' to assign permissions. AGIC Identity needs atleast has 'Contributor' access to Application Gateway 'myappgw' and 'Reader' access to Application Gateway's Resource Group '<rgName>'." InnerError="network.ApplicationGatewaysClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' with object id '41ff78bf-3b92-xxxx-xxxx-180e4c7970a2' does not have authorization to perform action 'Microsoft.Network/applicationGateways/read' over scope '/subscriptions/<subid>/resourceGroups/<rgName>/providers/Microsoft.Network/applicationGateways/myappgw-' or the scope is invalid. If access was recently granted, please refresh your credentials.""

Step 1

Wonder why there is a section called solution? will come to know soon. The error is somewhat clear. But my pre-created managed identity has permissions. When we closely look we can see 2 issues 
  1. The error message shows a different client id of MI instead of what we have given.
  2. Since the client id used is different. It is newly created and obviously, it doesn't have permission to manage AAG.
The first issue because it is not using the identity that we passed to the bicep. We should have taken the warning seriously.

The solution is to give contributor access to AAG. Let us worry about the least permission later. The name of the newly created identity follows the below convention so it is easy to search.

"ingressapplicationgateway-<cluster name>"

Problem 2

Once we fix the issue, we may be thinking all good as AGIC has permission to AAG to create listeners, backend pools, and settings. But it is not enough as the below error started showing up.

E1206 02:59:09.217888       1 controller.go:141] network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' with object id '41ff78bf-xxxx-xxxx-xxxxx-180e4c7970a2' has permission to perform action 'Microsoft.Network/applicationGateways/write' on scope '/subscriptions/<subid>/resourceGroups/<rgName>/providers/Microsoft.Network/applicationGateways/myappgw'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptio957", FieldPath:""}): type: 'Warning' reason: 'FailedApplyingAppGwConfig' network.ApplicationGatewaysClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' with object id '41ff78bf-xxxx-xxxx-xxxx-180e4c7970a2' has permission to perform action 'Microsoft.Network/applicationGateways/write' on scope '/subscriptions/<subid>/resourceGroups/<rgName/providers/Microsoft.Network/applicationGateways/myappgw'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptions/<subid>/resourcegroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/myappgw-ManagedIdentity' or the linked scope(s) are invalid."

Step 2

Here we can see 2 problems

  1. The error message is misleading with the same client id and object id. In fact, they are different. It seems people at Microsoft itself are confused about this AAD security mechanism.
  2. AGIC identity is looking for permissions on the identity of AAG.
God knows why the AGIC requires permission on AAG's managed identity to manage the AAG details. This issue can also be fixed by giving a contributor permission to the managed identity of AAG.

Happy debugging...

Read the docs

This is the standard answer we get when we do trial and error. As of writing this post, the permission on the identity of AAG is not mentioned in the brownfield installation.

References

  1. https://www.torivar.com/2022/09/16/aks-with-agic-existing-agw/ - Almost the same experience but with Terraform IaC

No comments: