The release of Confluent for Kubernetes puts Confluent APIs into Kubernetes API for data-streaming with Apache Kafka, enabling customers to manage all Confluent Platform components across multicloud and on-premise environments. By doing so, the platform bridges a much-needed gap for data access between on-premises and cloud infrastructure for data streaming in Kubernetes environments.

Confluent Platform 6.0 was created to solve several issues operations teams face when managing clusters with Kafka, including having to manually link clusters together. Remediating single cluster failures, as well as linking them together, has been one of the more time-consuming and resource-draining tasks for operations teams with Kafka. The new cluster link feature accomplishes this by both automating the pooling together of the different clusters into a “global mesh of Kafka,” Confluent said.

The wide-scale adoption of the Apache Kafka data-streaming platform can be reflected in the skyrocketing need for a data-management platform that can underpin data operations and address needs across a number of sources, often at a global scale. According to a recent Gartner report “Understanding Cloud Data Management Architectures: Hybrid Cloud, Multicloud and Intercloud,” for example, almost half of all organizations with data-management operations manage data on on-premises and cloud environments (typically multicloud). While Gartner says more than 80% of these organizations rely on multicloud environments.

Previously, before the release of Confluent for Kubernetes Rohit Bakhshi, a product manager at Confluent, described how customers might use Confluent Operator alone — which was created to manage Apache Kafka and Confluent Platform on Kubernetes.

Read More:   How to Improve Your Kubernetes CI/CD Pipelines with GitLab and Open Source – InApps 2022

“With Confluent for Kubernetes we drew on our experience managing thousands of Kafka clusters in Confluent Cloud to create a cloud-native experience for managing data streams, even for customers on private infrastructures,” Bakhshi said.

In this way, Confluent for Kubernetes extends the Kubernetes API, through CustomResourceDefinitions, to offer resources to manage all Confluent Platform components. It also introduces CustomResourceDefinitions to manage topics and Role Based Access Control (RBAC) policies. This enables DevOps teams to rely on CI/CD for cloud native deployments — as well as GitOps — to manage their complete streaming infrastructure and streaming applications as code, Bakhshi said.

“Confluent for Kubernetes packages best practices to automate secure by default, production-ready Confluent deployments. Now, customers can get a completely secure deployment — with strong authorization, RBAC authorization and complete network encryption — deployed with one deployment spec,” Bakhshi said. “This was a multistep manual process before.”

Other capabilities Confluent for Kubernetes offers that Bakhshi described include:

A CustomResourceDefinitions-based API for “security, reliability and DevOps automation,”  Bakhshi said. DevOps teams can now utilize Kubernetes taints and tolerations to schedule Confluent alongside other workloads on their shared Kubernetes private cloud infrastructure. Security options include Kubernetes Secrets and Hashicorp Vault for secrets management for Confluent deployments.

A kubectl CLI plugin for managing and troubleshooting Confluent for Kubernetes. With this CLI plugin, DevOps teams can view the state of their deployment, “quickly get troubleshooting information” and launch the GUI Control Center interface for their Confluent deployment, Bakhshi said.

For in-cache memory to ensure ultra-low latency for common data pool access, Confluent for Kubernetes’ declarative API allows for the management of clusters with ksqlDB, which is a database for application streaming. KsqlDB can import external datasets — both non-streaming and streaming — with its connector support,” Michael Drogalis, a Confluent product manager, said. Once imported, the data sets are processed as tables of data. For example, ksqlDB can import data from AWS S3 and Kinesis, MySQL, Postgres, Salesforce and generic file systems.

“For streaming analytics, data is continuously imported as streams so that future changes in external datasets are reflected in ksqlDB with minimal delay measured in seconds,” Drogalis said. “So, streaming queries always operate on the most recent data — effectively acting as an in-memory cache with low latency access,” Drogalis described. “Customers need this for analytics, and when processing things like financial transactions with Confluent using ksqlDB, such as for mainframe-offload use cases.”

Read More:   Update The Algorithm That Ties Together Google Maps and InfluxData’s Giraffe

Data Streaming Pain Points

Organizations often require real-time data access across a shared data pool, across both multicloud and on-premises infrastructures. Kafka data streaming can ensure the low latency required for data streaming across different distributed applications and infrastructures.

“Businesses need to harness the data that constantly flows throughout their organizations in real-time. But to do that, they need a central nervous system for all their data across their traditional line-of-businesses (LOBs), environments and applications,” Bakhshi said. “They need to provide consistent experiences and standardized best practices to roll this data in motion platform out to global teams across their entire enterprise — both in the cloud and on-premises.”

For example, a Confluent Fortune 500 customer is tasked with shifting business analytics applications to the public cloud from a legacy system supporting critical IT infrastructure in their data center, Bakhshi said. “To bridge this hybrid infrastructure they are using Kafka between their datacenter and public cloud,” Bakhshi said. “With Confluent for Kubernetes they can seamlessly deploy and operate Confluent Platform in their private datacenter and connect the data in their cloud environments to Confluent Cloud.”