Multi AWS EKS Cluster ArgoCD with Pod Identity
First of all, AHHHHHHH! What a pain in the ass this was, so I am doing this. For the record, ChatGPT, Claude, and Gemmi all got it wrong, not even close! And according to a GitHub issue, Argocd is still working on the Docs for this; apparently, it's harder to do the docs than it was to get the code to work.
Rant over 😄
So what are we talking about? A couple of main concepts here:
- AWS has Pod Identity, which, in the simplest terms, allows you to assign an AWS IAM role to a Pod via a k8 service account!
- One way to run k8's is to have a multi-cluster option, this could be for many reasons, but the one I was focused on in this instance was having a Control cluster (where I host all the apps the platform team owns and runs,) then a Worker cluster/(s) (where dev teams deploy all the apps)
- ArgoCD is a GitOps deployment tool and allows you to deploy to multiple k8 clusters from one cluster. And in this instance, I need ArgoCD to have access to all the other EKS clusters that I will be running.
Side note, in this setup, I have two VPCs connected via transit gateway for routing, the EKS control nodes are in "Intra" subnets (so no outbound internet), and everything is in the same AWS account (Not to say you can't do this cross-account).
IAM Roles and Policies
So let's start with the roles and policy. If you go by the docs that ArgoCD has currently, it will lead you down a garden path where you will want to hit your keyboard with your head. Let's not even go with what AI gives at the moment!
So we start with what I like to call "Argocd Management Role", it needs to have a trust Policy that has two main statements, one for assuming the role for the pods and one that allows itself to assume the role! Yes, and this is key, and don't forget it. In the example below, I am using a condition to give the role, and you don't need to do this, but it allows you to have multiple management roles for different clusters if you want.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
]
},
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
],
"Condition": {
"ArnEquals": {
"aws:PrincipalArn": "arn:aws:iam::<SOME AWS ACCOUNT ID>:role/global-argocd-management-role"
}
}
}
]
}
Then the role needs to have eks permissions, for now I used all permissions and I would hope that it doesn't need as much and I will update when I figure out the less privigled permission list.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "eks:*",
"Resource": "*"
}
]
}
EKS Access
This is also an important part, and make sure you do not miss it! The management role that you create needs "IAM Access Entry".
This needs to be done on all the clusters that argocd isn't on but needs to deploy to. It needs to have Standard type permission:
Then you need to give it Cluster Admin Permissions, now in the UI, AWS has done a bit of shit, and you need to make sure you add the policy, not just select it 😦. It's quite easy in the console to select the policy and move on without adding!
Ok, so now the Argo management IAM has permissions to the cluster, we need to allow ArgoCD to have access to the role. In comes AWS EKS Pod Identity.
Pod Identity Association
On the control cluster that ArgoCD is running, the ArgoCD server, Argocd Appset controller and the ArgoCD Application controller need to have access to the Argo Management Role we created.
Basically, you need to associate the role with a service account that is in your K8 cluster. The easiest way to show you this is to give you a bit of my Terraform that does it for me
locals = {
argocd_service_accounts = [
{
name = "argocd-server"
namespace = "argocd"
},
{
name = "argocd-application-controller"
namespace = "argocd"
},
{
name = "argocd-applicationset-controller"
namespace = "argocd"
}
]
cluster_name = "worker-eks-cluster"
}
resource "aws_eks_pod_identity_association" "argocd_management" {
count = length(local.argocd_service_accounts)
cluster_name = local.cluster_name
namespace = local.argocd_service_accounts[count.index].namespace
service_account = local.argocd_service_accounts[count.index].name
role_arn = aws_iam_role.argocd_management_role.arn # The arn of the argocd management role you create
}
Once that's run in, I also find it's best to add the Role ARN to the service accounts through you ArgoCD Helm chart. I have been told you don't need to do this, but I think it's good to be able to see it through whatever management UI you use for K8s.
So in your Helm values for ArgoCD, I had the following:
## Application controller
controller:
serviceAccount:
create: true
name: argocd-application-controller
annotations:
"eks.amazonaws.com/role-arn": "arn:aws:iam::123456789:role/global-argocd-management-role"
server:
serviceAccount:
create: true
name: argocd-server
annotations:
"eks.amazonaws.com/role-arn": "arn:aws:iam::123456789:role/global-argocd-management-role"
applicationSet:
replicas: 3
serviceAccount:
create: true
name: argocd-applicationset-controller
annotations:
"eks.amazonaws.com/role-arn": "arn:aws:iam::626887788195:role/global-argocd-management-role"
One thing that is really important, at this stage now that we have done the association, restart all the ArgoCD server, Argocd Appset controller and the ArgoCD Application controller pods!
ArgoCD Cluster Secret
Lastly, tell ArgoCD about your other cluster. You can do this in CLI or as yaml as a K8 secret, the basic format of it should look like this:
apiVersion: v1
kind: Secret
metadata:
name: worker-eks-argocd-secrets
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
annotations:
data:
config: >-
some <base64 encoded Config String> # see below
name: <name of cluster base64 encoded>
server: >-
<base64 encoded Server string> #see below
type: Opaque
Don't forget the label!!!!
Config String to be base64 encoded:
{
"awsAuthConfig": {
"clusterName": "worker-eks-cluster",
"roleARN": "arn:aws:iam::123456789:role/global-argocd-management-role"
},
"tlsClientConfig": {
"caData": "<base64 encoded CA cert from your cluster>",
"insecure": false
}
}
Server String to be base64 encoded:
https://some_random_host_name.gr7.eu-west-1.eks.amazonaws.com
Once this is applied, you should see the cluster appear in the ArgoCD UI.
The only thing I would warn people about here is two things:
- Don't be worried if it doesn't say the connection status is Successful at this point, only worry if its failed
- It will most likely say something along the line of Cluster is monitored by no applications. This is fine, you need to try and deploy something to that cluster before it really connects and goes green.
How to deploy to your new Cluster
I won't go deep into this, but if you're deploying using Application Sets, then basically you just need to change your destination:
...
spec:
project: "some project"
source:
repoURL: https://github.com/wonderphil/helm.git
targetRevision: "{{ targetRevision }}"
path: "src/helm_charts/{{ project }}/{{appName}}"
helm:
ignoreMissingValueFiles: true
valueFiles:
- "values.common.yaml"
- "{{ deployment_environment }}/values.{{ app }}.{{ region_code }}.yaml"
destination:
name: worker-eks-cluster
namespace: "{{namespace}}"
Just make sure the destination.name
matches what's in the ArgoCD cluster secret we just deployed above, and it should be all good.
Other things to note
Don't forget to make sure network access is there:
- routes through transit gateway
- routes at subnet level
- security group rules!