Setting up DNS and TLS in AKS

To host any service/model endpoints a domain has to be used to expose them to the external world or to an internal network. Below document will help you to set the same in your Azure AKS cluster. Any number of domains can be setup for your cluster.

Setting up DNS

There are two kind of domains that you can setup for TrueFoundry workloads

  1. Wild card domains - *.example.com, *.tfy.example.com, *.ml.example.com
  2. Non wild card domains - tfy.example.com, dev.example.com, prod.example.com

Wild card domains (recommended)

In wild card domains a subdomain wildcard is dedicatedly used to resolve endpoints in the GKE cluster. Some of the samples are given below where example.com is your domain. The services will be exposed like

  • service1.tfy.example.com
  • service2.tfy.example.com

Non wild card domains

In non-wild card domains a dedicated domain is used to resolve endpoints. Some of the samples for service endpoints will look like

  • tfy.example.com/service1
  • tfy.example.com/service2

Load balancer IP address

Once a domain name is decided a DNS record is to be mapped with the load balancer IP address in the AKS cluster. To get the load balancer's IP address run the following command

kubectl get svc -n istio-system tfy-istio-ingress -ojsonpath={.status.loadBalancer.ingress[0].ip}

Create a DNS record in your cloud DNS or your DNS provider with the following details

Record TypeRecord NameRecord value
A*.tfy.example.comLOADBALANCER_IP_ADDRESS

Setting up TLS

We support creation of TLS certificates with the help of cert-manager. cert-manager in our case use LetsEncrypt which issues the certificate. Cert-manager can connect with various DNS provider so that LetsEncrypt can verify if you are the true owner of the domain or not. Below example is to create TLS certificate using Azure DNS. Feel free to use other DNS provider by following the cert-manager documentation to create the certificates

1. Exporting variables and enabling workload identity

export CLUSTER_NAME=""
export RESOURCE_GROUP=""
export AZURE_SUBSCRIPTION_ID=""
export SERVICE_ACCOUNT_NAME=cert-manager
export SERVICE_ACCOUNT_NAMESPACE=cert-manager
export MAIL_ID="[email protected]"
export OIDC_ISSUER_URL=$(az aks show \
--resource-group $RESOURCE_GROUP \
--name $CLUSTER_NAME \
--query "oidcIssuerProfile.issuerUrl" -o tsv)
export LOAD_BALANCERIP=$(kubectl get svc \
-n istio-system tfy-istio-ingress \
-ojsonpath='{.status.loadBalancer.ingress[0].ip}')

# identity name 
export IDENTITY_NAME="$CLUSTER_NAME"

# getting principal ID of the identity
PRINCIPAL_ID=$(az identity create \
--name "${IDENTITY_NAME}" \
--resource-group "${RESOURCE_GROUP}" \
--query principalId -otsv)

# getting Client ID of the identity
IDENTITY_CLIENT_ID=$(az identity show \
--name "${IDENTITY_NAME}" \
--resource-group "${RESOURCE_GROUP}" \
--query 'clientId' -otsv)

echo "PRINCIPAL_ID: ${PRINCIPAL_ID}"
echo "IDENTITY_CLIENT_ID: ${IDENTITY_CLIENT_ID}"

echo "OIDC_ISSUER_URL: ${OIDC_ISSUER_URL}"
echo "LOAD_BALANCERIP: ${LOAD_BALANCERIP}"

Workload identity federation needs to be enabled for the cluster. It is not required if the cluster is created using ocli.

az aks update \
    --name ${CLUSTER_NAME} \
    --resource-group ${RESOURCE_GROUP} \
    --enable-oidc-issuer \
    --enable-workload-identity

2. Created a hosted zone and assign the identity DNS permissions

To make Azure your DNS provider, create a hosted zone in Azure DNS. It is not a compulsory requirement and you can use a domain from an already existing hosted DNS zone in Azure.

# export DNS_HOSTED_ZONE="ml.example.com"
export DNS_HOSTED_ZONE=""

# create the DNS record
az network dns zone create \
--name ${DNS_HOSTED_ZONE} \
--resource-group ${RESOURCE_GROUP} \
--query nameServers

The above command will print the nameservers which have to setup in your main DNS provider (for e.g. GoDaddy or NameCheap) as an NS record.

OR if you already have an existing hosted zone (If your DNS hosted zone exists in other resource group, export the variables from that resource group in the below section)

# resource group where you DNS zone is hosted, 
# change this to your resource group if its not the same as your cluster
DNS_ZONE_RESOURCE_GROUP="${RESOURCE_GROUP}"

# creating the DNS_ZONE
DNS_ZONE_ID=$(az network dns zone show \
--name ${DNS_HOSTED_ZONE} \
--resource-group ${DNS_ZONE_RESOURCE_GROUP} \
--query id -otsv)

Give access to the principal identity

# assigning permissions of the DNS zone to the identity 
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "DNS Zone Contributor" \
--scope $DNS_ZONE_ID

# assign the federated credentials from the identity to the cert-manager
az identity federated-credential create \
      --name "cert-manager" \
      --identity-name "${IDENTITY_NAME}" \
      --issuer "${OIDC_ISSUER_URL}" \
      --resource-group "${RESOURCE_GROUP}" \
      --subject "system:serviceaccount:${SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_NAME}"

3. Install cert-manager

  1. Head over to the Integrations tab in the left panel and select the right cluster in which you want to install cert-manager. Click on the Three dots at the left-panel and select Manage Applications.
    1. Install cert-manager, if not already installed by creating a workspace.
  2. If cert-manager is already installed, go to Deployments -> Helm -> Filter the charts with the cluster name and click on edit from the three dots at the right.
  3. In the values section, ensure the below lines exists
    installCRDs: true
    extraArgs:
      - --issuer-ambient-credentials
    podLabels:
      azure.workload.identity/use: "true"
    serviceAccount:
      labels:
        azure.workload.identity/use: "true"
    

4. Creating an issuer and a certificate

  1. Download the kubeconfig file for your cluster
    az aks get-credentials --name CLUSTER_NAME --resource-group RESOURCE_GROUP
    
  2. Create an issuer and make sure to change the name and the privateKeySecretRef name
    kubectl apply -f - <<EOF
    apiVersion: cert-manager.io/v1
    kind: Issuer
    metadata:
      name: example-issuer
      namespace: istio-system
    spec:
      acme:
        email: $MAIL_ID
        server: https://acme-v02.api.letsencrypt.org/directory
        privateKeySecretRef:
          name: example-privkey
        solvers:
        - dns01:
            azureDNS:
              hostedZoneName: $DNS_ZONE_NAME
              resourceGroupName: $DNS_ZONE_RESOURCE_GROUP
              subscriptionID: $AZURE_SUBSCRIPTION_ID
              environment: AzurePublicCloud
              managedIdentity:
                clientID: $IDENTITY_CLIENT_ID
    EOF
    
  3. Create the certificate by referencing the above issuer example-com-issuer in the certificate with the replaced issuer name. Also add the dnsNames accordingly
    kubectl apply -f - <<EOF
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: example-cert
      namespace: istio-system
    spec:
      secretName: example-tls
      duration: 2160h # 90d
      renewBefore: 360h # 15d
      issuerRef:
        # Issuer name from the previous step
        name: example-issuer
      dnsNames:
      - "example.truefoundry.com"
      - "*.example.truefoundry.com"
    EOF
    
  4. Check the status of the certificate by running the following command. Wait for the certificates to go in the Ready state.
kubectl get certificates -n istio-system

🚧

Certificate is not ready

If the certificate is not in ready state for more then 10 minutes, it means that there is some issue in the access or the domain name. Please check the logs of cert-manager pods in the cert-manager namespace for more details.

5. Attaching the TLS secret to the Load balancer

  1. Once the certificate is created a secret will be present in the istio-system namespace with the name given in .spec.secretName while creating a certificate object.
  2. Head over to the Deployments -> Helm -> filter helm chart for your cluster -> tfy-istio-ingress and then add the secret in .tfyGateway.spec.servers for HTTPS you need to add the tls.mode and tls.credentialName. Please ensure port.protocol is HTTPS for port 443
    tfyGateway:
      name: tfy-wildcard
      spec:
        selector:
          istio: tfy-istio-ingress
        servers:
          - hosts:
              - "*..example.com"
              - "example.com"
            port:
              name: http-tfy-wildcard
              number: 80
              protocol: HTTP
            tls:
              httpsRedirect: true
          - hosts:
              - "*.example.com"
              - "example.com"
            port:
              name: https-tfy-wildcard
              number: 443
              protocol: HTTPS # make sure to keep it HTTPS
            tls:
              mode: SIMPLE
              credentialName: example-com-tls
    

6. Adding the domain in the cluster metadata

  1. Head over to the Integrations section in the platform and click on the Edit in the cluster.
  2. Enable Show advanced fields from the bottom and add enable the Base Domain URL section.
  3. Add the domain URL.