• Home
  • >
  • DevOps News
  • >
  • A Deep Dive into Architecting a Kubernetes Infrastructure – InApps 2022

A Deep Dive into Architecting a Kubernetes Infrastructure – InApps is an article under the topic Devops Many of you are most interested in today !! Today, let’s InApps.net learn A Deep Dive into Architecting a Kubernetes Infrastructure – InApps in today’s post !

Read more about A Deep Dive into Architecting a Kubernetes Infrastructure – InApps at Wikipedia

You can find content about A Deep Dive into Architecting a Kubernetes Infrastructure – InApps from the Wikipedia website

So far in this series, we explored the various questions one might have when starting off with Kubernetes and its ecosystem and did our best to answer them. Now that justice has been done to clear the clouded thoughts you may have, let us now dive into the next important step in our journey with Kubernetes and the infrastructure as a whole.

In this blog, we will look at the best possible way to architect your infrastructure for your use case and the various decisions you may want to take depending on your constraints.

The Architecture

Your architecture hugely revolves around your use case and you have to be very careful in getting it right and take proper consultation if needed from experts. While it is very important to get it right before you start, mistakes can happen, and with a lot of research happening these days, you can often find any revolution happen any day which can make your old way of thinking obsolete.

That is why I would highly recommend you to Architect for Change and make your architecture as Modular as possible so that you have the flexibility to do incremental changes in the future if needed.

Let’s see how we would realize our goal of architecting our system considering a client-server model in mind.

The Entry Point: DNS

In any typical infrastructure (cloud native or not), a message request has to be first resolved by the DNS server to return the IP address of the server. Setting up your DNS should be based on the availability you would require. If you require higher availability, you may want to distribute your servers across multiple regions or cloud providers depending on the level of availability you would like to achieve.

Content Delivery Network (CDN)

In some cases, you might need to serve the users with minimum latency as possible and also reduce the load on your servers. This is where Content Delivery Network (CDN) plays a major role.

Does the client frequently request a set of static assets from the server? Are you aiming to improve the speed of delivery of content to your users while also reducing the load on your servers? In such cases, a CDN at edge serving a set of static assets might actually help to both reduce the latency for users and load on your servers.

Is all your content dynamic? Are you fine with serving content to users with some level of latency in favor of reduced complexity? Or is your app receiving low traffic? In such cases, a CDN might not make much sense to use and you can send all the traffic directly to the Global Load Balancer. But do note that having a CDN also does have the advantage of distributing the traffic which can be helpful in the event of DDOS attacks on your server.

CDN providers include Cloudfare CDN, Fastly, Akamai CDN, Stackpath and there is a high chance that your cloud provider might also offer a CDN service like Cloud CDN from Google Cloud Platform, CloudFront from Amazon Web Services, Azure CDN from Microsoft Azure and the list goes on.

Edge Network

Load Balancers

If there is a request that cannot be served by your CDN, the request will next hit your load balancer. And these can be either regional with Regional IPs or global with Anycast IPs and in some cases, you can also use load balancers to manage internal traffic.

Apart from routing and proxying the traffic to the appropriate backend service, the load balancer can also take care of responsibilities like SSL Termination, integrating with CDN and even managing some aspects of  network traffic.

While hardware load balancers do exist, software load balancers provide greater flexibility, cost reduction and scalability.

Similar to CDNs, your cloud providers should be able to provide a load balancer as well for you (such as GLB for GCP, ELB for AWS, ALB for Azure, etc.) but what is more interesting is that you can provision these load balancers directly from Kubernetes constructs. For instance, creating an ingress in GKE (aka GKE ingress) also creates a GLB for you behind the scenes to receive the traffic and other features like CDN, SSL Redirects, etc. can also be set up just by configuring your ingress as seen here.

Read More:   Top 5 Benefits of a Site Reliability Platform – InApps Technology 2022

While you should always start small, load balancers would allow you to scale incrementally having architectures like this:

Multi Region Multi Cluster Setup

Networking and Security Architecture

The next important thing to take care of in your architecture is the networking itself. You may want to go for a private cluster if you want to increase security. There you can moderate the inbound and outbound traffic, mask IP addresses behind NATs, isolate networks with multiple subnets across multiple VPCs and so on.

How you setup your network would typically depend on the degree of flexibility you are looking for and how you are going to achieve it. Setting up the right networking is all about reducing the attack surface as much as possible while still allowing for regular operations.

Protecting your infrastructure by setting up the right network also involves setting up firewalls with the right rules and restrictions so that you allow only the traffic as allowed to/from the respective backend services both inbound and outbound.

In many cases, these private clusters can be protected by setting up Bastion Hosts and tunneling through them for doing all the operations in the cluster, since all you have to expose to the public network is the Bastion (aka Jump host) — typically setup in the same network as the cluster.

Some cloud providers also provide custom solutions in their approach towards Zero Trust Security. For instance, GCP provides its users with Identity Aware Proxy (IAP) which can be used instead of typical VPN implementations.

Once all of these are taken care of, the next step to networking would be setting up the networking within the cluster itself depending on your use case.

It can involve tasks like:

If you would like to look at some sample implementations, I would recommend looking at this repository which helps users set up all these different networking models in GCP including hub and spoke via peering, hub and spoke via VPN, DNS and Google Private Access for on-premises, Shared VPC with GKE support, ILB as next hop and so on, all using Terraform.

And the interesting thing about networking in cloud is that it need not be just limited to the cloud provider within your region but can span across multiple providers across multiple regions as needed. This is where projects like Kubefed or Crossplane could help.

If you would like to explore more on some of the best practices when setting up VPCs, subnets and the networking as a whole, I would recommend going through this page, and the same concepts are applicable for any cloud provider you are onboard with.

Kubernetes

If you are using managed clusters like GKE, EKS, AKS, Kubernetes is automatically managed, thereby lifting a lot of complexity away from the users.

If you are managing Kubernetes yourself, you need to take care of many things like, backing up and encrypting the etcd store, setting up networking among various nodes in the clusters, patching your nodes periodically with the latest versions of OS, managing cluster upgrades to align with the upstream Kubernetes releases. This is only recommended if you can afford to have a dedicated team that does just this.

Site Reliability Engineering (SRE)

When you maintain a complex infrastructure, it is very important to have the right observability stack in place so that you can find out errors even before they are noticed by your users, as well as to predict possible changes, identify anomalies and have the ability to drill down deep into where the issue exactly is.

Now, this would require you to have agents that expose metrics as specific to the tool or application to be collected for analysis (which can either follow the push or pull mechanism). And if you are using service mesh with sidecars, they often do come with metrics without doing any custom instrumentation by yourself.

In any such scenarios, a tool like Prometheus can act as the time series database to collect all the metrics for you along with something like OpenTelemetry to expose metrics from the application and the various tools using built-in exporters. A tool like Alertmanager can send notifications and alerts to multiple channels, while Grafana will provide the dashboard to visualize everything in one place, giving users complete visibility on the infrastructure as a whole.

In summary, this is what the observability stack involving Prometheus would look like:

Prometheus Architecture

(Source: https://prometheus.io/docs/introduction/overview/)

Having complex systems like these also require the use of log aggregation systems so that all the logs can be streamed into a single place for easier debugging. This is where people tend to use the ELK or EFK stack with Logstash or FluentD doing the log aggregation and filtering for you based on your constraints. But there are new players in this space, like Loki and Promtail.

Read More:   Update Embracing NewSQL: Why PalFish Chose TiDB

This is how log aggregation systems like FluentD simplify our architecture:

Log Aggregation

(Source: https://www.fluentd.org/architecture)

But what about tracing your request spanning across multiple microservices and tools? This is where distributed tracing also becomes very important especially considering the complexity that microservices come with. Tools like Zipkin and Jaeger have been pioneers in the area, with the recent entrant to this space being Tempo.

While log aggregation would give information from various sources, it does not necessarily give the context of the request and this is where doing tracing really helps. But do remember, adding tracing to your stack adds a significant overhead to your requests since the contexts have to be propagated between services along with the requests.

This is how a typical distributed tracing architecture looks like:

Jaeger Architecture

(Source: https://www.jaegertracing.io/docs/1.21/architecture/)

But site reliability does not end with just monitoring, visualization and alerting. You have to be ready to handle any failures in any part of the system with regular backups and failovers in place so that either there is no data loss or the extent of data loss is minimized. This is where tools like Velero play a major role.

Velero helps you to maintain periodic backups of various components in your cluster including your workloads, storage and more by leveraging the same Kubernetes constructs you use. This is how Velero’s architecture looks like:

Velero Architecture

(Source: https://velero.io/docs/v1.5/how-velero-works/)

As you notice, there is a backup controller that periodically makes backups of the objects, pushing them to a specific destination with the frequency based on the schedule you have set. This can be used for failovers and migrations since almost all objects are backed up.

Storage

There are a lot of different storage provisioners and filesystems available, which can vary a lot between cloud providers. This calls for a standard like Container Storage Interfact (CSI) which helps push most of the volume plugins out of the tree thereby making it easy to maintain and evolve without the core being the bottleneck.

This is what the CSI architecture typically looks like supporting various volume plugins:

Kubernetes Storage Management

(Source: https://kubernetes.io/blog/2018/08/02/dynamically-expand-volume-with-csi-and-kubernetes/)

What about clustering, scaling and various other problems that comes with distributed storage?

This is where file systems like Ceph has already proved themselves, though considering that Ceph was not built with Kubernetes in mind and is very hard to deploy and manage, this is where a project like Rook could also help.

While Rook is not coupled to Ceph, and supports other filesystems like EdgeFS, NFS, etc. as well, Rook with Ceph CSI is like a match made in heaven. This is how the architecture of Rook with Ceph looks like:

Rook Ceph Architecture

(Source: https://rook.io/docs/rook/v1.5/ceph-storage.html)

As you can see, Rook takes up the responsibility of installing, configuring and managing Ceph in the Kubernetes cluster. The storage is distributed underneath automatically as per the user preferences. All this happens without the app being exposed to any complexity.

Image Registry

A registry provides you a user interface where you can manage various user accounts, push/pull images, manage quotas, get notified on events with webhooks, do vulnerability scanning, sign the pushed images, and also handle operations like mirroring or replication of images across multiple image registries.

If you using a cloud provider, there is a high chance that they already provide image registry as a service already (eg. GCR, ECR, ACR, etc.) which removes a lot of the complexity. If your cloud provider does not provide one, you can also go for third party registries like Docker Hub, Quay, etc.

But what if you want to host your own registry?

This may be needed if you either want to deploy your registry on-premises, want to have more control over the registry itself, or want to reduce costs associated with operations like vulnerability scanning.

If this is the case, then going for a private image registry like Harbor might actually help. This is what the architecture of Harbor looks like:

Harbor Architecture

(Source: https://goharbor.io/docs/1.10/install-config/harbor-ha-helm/)

Harbor is an OCI compliant registry made of various open source components, including Docker registry V2, Harbor UI, Clair, and Notary.

CI/CD Architecture

Kubernetes acts as a great platform for hosting all your workloads at any scale, but this also calls for a standard way of deploying the applications with a streamlined continuous integration/continuous delivery (CI/CD) workflow. This is where setting up a pipeline like this can really help.

CI/CD Architecture

Some third-party services like Travis CI, Circle CI, Gitlab CI or Github Actions include their own CI runners. You just define the steps in the pipeline you are looking to build. This would typically involve: building the image, scanning the image for possible vulnerabilities, running the tests and pushing it to the registry and in some cases provisioning a preview environment for approvals.

Read More:   What Observability Should Do for Your Organization – InApps 2022

Now, while the steps would typically remain the same if you are managing your own CI runners, you would need to configure them to be set up either within or outside your clusters with appropriate permissions to push the assets to the registry.

Conclusion

We have gone over the architecture of the Kubernetes-based cloud native infrastructure. As we have seen above, various tools address different problems with infrastructure. They are like Lego blocks, each focusing on a specific problem at hand,  abstracting away a lot of complexity for you.

This allows users to leverage Kubernetes in an incremental fashion rather than getting on board all at once, using just the tools you need from the entire stack depending on your use case.

If you have any questions or are looking for help or consultancy, feel free to reach out to me @techahoy or via LinkedIn.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Docker.

Source: InApps.net

List of Keywords users find our article on Google:

kubernetes io
gitlab ci kubernetes
aws load balancer
aws elb pricing
kubernetes firewall
docker hub circleci
circleci dockerhub
logstash docker
hire logstash developers
aws elb dashboard
ecr circleci
circleci gke
load balancer metrics
azure cdn
elb metrics
alertmanager gitlab
ceph observability
terraform cloudfront
ceph releases
el segundo notary
azure logstash
kubernetes logstash deployment
gitlab ci include
high availability checklist
circle ci vpn
lego helms deep
aws alb ingress controller
docker deep dive
logstash kubernetes deployment
software native load balancers
hop on hop off ho chi minh
circleci ecr
jobs at gcr
aws elb metrics
kubernetes docs
load balancing vpn
install logstash on aws
harbor kubernetes
aws hub and spoke
elk in aws
kubernetes up and running dive into the future of infrastructure
circleci ecr push
csi kubernetes
terraform circle ci
kubernetes load balancer
google ddos
“inbound fintech ltd”
stackpath jobs
circleci docker hub
ceph docs
ceph github
jaeger github
hire elb developer
small business revolution wikipedia
load balancer reviews
networking components wikipedia
quay container registry
amazon web services elk
gke gitlab
oci e-commerce platform
space management wikipedia
oci bastion
network as a service wikipedia
private terraform registry
ceph dashboard
ecr traffic
lego modular building 2022
oci load balancer
ats infrastructure ltd jobs
akamai identity cloud
kubernetes input plugin
cloudfront terraform
nfs ceph
aws ecr metrics
dockerhub circleci
eks dashboard without proxy
what is inbound user provisioning
akamai vs amazon cloudfront
io.prometheus.client
quay kubernetes
saas multi-cloud load balancer
stackpath reviews
aws alb pricing
entry in inbound table not found
google cloud cdn review
hire elk stack developer
logstash plugins
aws vpc peering terraform
circleci without docker
cloud google kubernetes
gitlab ecr push
logstash filter
shared vpc
aws load balancer icon
azure elk as a service
ceph public network
circleci v2
elb observability
gitlab runner gke
logstash
logstash input
ceph cluster network
stackpath review
circleci contexts
gitlab ci helm
google kubernetes comes under which of the following services?
internal load balancer
gitlab runner entrypoint
how cloud-native storage can simplify your kubernetes
load balancer behind firewall
architect summary linkedin
ceph consulting
checklist for oci application
elk stack on azure
gitlab docker registry cleanup
logstash ha
troubleshooting monitoring and tracing windows infrastructure
zipkin server
circleci aws ecr
gitlab ci entrypoint
gitlab runners
lego bastion set
logstash schedule
oci migration checklist
akamai technologies jobs
grafana azure dashboard
high availability gitlab
issues surrounding the decisions to build and/or host your own ecommerce
site or to outsource some aspects of site development.
logstash filtering
oci network architecture
which type of architecture deploys the vpn so that traffic to and from the
vpn is not firewalled
google cloud nfs storage
hire cloud infrastructure architects
oci shared security model
saas load balancer
stackpath cdn reviews
akamai traffic management
aws elb latency
aws-alb-ingress-controller
docker ceph
elk on aws
logstash file input
static load balancing faqs
azure event hub logstash
cdn mesh delivery
circleci vs travis
circleci vs travis ci
fluentd plugin
gcp vpn pricing
grafana gitlab dashboard
successful load balancing architectures
travis ci vs circleci
opentelemetry vs prometheus
circleci dynamic config
docker hub elk
elk vs grafana
gcp kubernetes network security
ssl native plugins review
travis vs circleci
alb ingress controller
csi driver kubernetes
dns level load balancing
etcd dashboard
etcd grafana
harbor vulnerability scanning
terraform cdn
terraform provider gcp
aws-ecr/build-and-push-image
aws-ecr/build-and-push-image circleci
ceph docker
container registry scanning
elk stack aws
gcp elk stack
opentelemetry and prometheus
sample logstash config file
a10 loadbalancer
aws elk setup
cephfs docker
circleci vpn
facebook infrastructure architecture
google container registry vulnerability scanning
hub and spoke aws
kubernetes application load balancer
kubernetes csi
logstash host
multi-cloud load balancers
nfs technology group
application load balancer ingress
aws list elb
aws managed elk
gitlab ci terraform
gke ingress
gke network policy
logstash aws
logstash kubernetes
Rate this post
As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Get a custom Proposal

Please fill in your information and your need to get a suitable solution.

    You need to enter your email to download

      Success. Downloading...