Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose - a fully managed service, which simplifies the process of capturing, transforming, and loading enormous quantities of streaming data from numerous sources into a range of destinations. The available destinations for your data streams encompass a variety of options, including Amazon S3, Amazon Redshift, Amazon OpenSearch Service, Kinesis Data Analytics, generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk. This seamless integration facilitates near real-time analytics and provides valuable insights from the data.
Key definitions for Amazon Kinesis Data Firehose:
-
Delivery streams
A delivery stream serves as the foundational entity in service. It involves creating a delivery stream and subsequently directing data to it in order to utilize the service effectively.
-
Easy launch and configuration
Amazon Kinesis Data Firehose enables effortless creation of delivery streams to load data into various destinations. This can be achieved with a few clicks in the AWS Management Console. Data can be sent to the delivery stream through the Firehose API or a provided Linux agent, and Kinesis Data Firehose consistently loads the data into the specified destinations.
-
Load new data in near real time
To regulate the speed at which data is uploaded to destinations, you can specify either the batch size or the batch interval. The service supports compression algorithms including GZip, Hadoop-Compatible Snappy, Zip, and Snappy.
-
Elastic scaling to handle varying data throughput
Automated delivery streams have the capability to dynamically scale up or down in order to accommodate high input data rates, reaching gigabytes per second or even higher.
-
Apache Parquet or ORC format conversion
Service supports columnar data formats such as Apache Parquet and Apache ORC. Service also supports analytics using services such as Amazon Athena, Amazon Redshift Spectrum, Amazon EMR, and other Hadoop based tools. Before storing the data in Amazon S3, it is possible to convert the format of incoming data from JSON to Parquet or ORC formats using the service.
-
Partitioned data delivery to S3
Service supports dynamic partitioning by unique key. Kinesis Data Firehose groups data by these keys and delivers them into key-unique S3 prefixes.
-
Integrated data transformations
Services give the ability to prepare streaming data before it is loaded to data stores. As a data processing service, it offers pre-built Lambda blueprints that enable the conversion of commonly encountered data sources such as Apache logs and system logs into JSON and CSV formats.
-
Optional automatic encryption
Service provides automatic data encryption after it is uploaded to the destination. Delivery stream configuration gives the ability to specify an AWS Key Management System (KMS) encryption key.
-
Metrics for monitoring performance
Amazon Kinesis Data Firehose offers comprehensive metrics for monitoring performance, including data delivery success, data delivery latency, and delivery-to-destination data rates. These metrics are available through CloudWatch, allowing users to gain insights into the health and performance of their data delivery pipelines. By utilizing CloudWatch Metrics, users can effectively monitor delivery streams within Amazon Kinesis Data Firehose, enabling them to identify and address any potential issues in real time.
-
Pay-as-you-go pricing
Enjoy pay-as-you-go pricing, which means you only pay for the volume of data you ingest into the service. There are no setup fees, and the pricing is based on the volume of data ingested, with additional charges for optional features such as format conversion, VPC delivery, and Dynamic Partitioning. Ingestion pricing is tiered and billed per GB ingested in 5KB increments, with additional charges for data transfer. Dynamic Partitioning, which enables continuous grouping of data by keys, incurs costs based on GB delivered to S3, per object, and optionally per JQ processing hour.
-
Tagging for delivery streams
Allows for tagging of delivery streams, providing users with the ability to categorize and manage their resources effectively. By using tags, users can easily allocate costs, control access, and gain insights into resource utilization. This feature enables streamlined resource organization and cost management, making it easier to track and allocate expenses associated with the delivery streams within Amazon Kinesis Data Firehose.
Service integrates with:
Usage use cases
-
Stream into data lakes and warehouses.
Effortlessly stream data directly into Amazon S3 and effortlessly transform the data into the necessary formats for analysis, all without the need to construct complex processing pipelines.
-
Boost security.
Leverage supported Security Information and Event Management (SIEM) tools to actively monitor network security in real time. Receive immediate alerts whenever potential threats emerge, ensuring proactive threat detection and response.
-
Build ML streaming applications.
Enhance your data streams by incorporating machine learning (ML) models to analyze the data and make predictions on inference endpoints as the streams progress towards their intended destinations.
FAQ for Amazon Kinesis Data Firehose
-
Does Amazon Kinesis Data Firehose support data transformation before data reaches its destination storage?
Yes, it is. There are ability to use predefined of custom data which will process steam data before final storage. -
Does Amazon Kinesis Data Firehose support data aggregation in Amazon S3?
Yes it is. Service supports dynamic partitioning by unique key. Kinesis Data Firehose groups data by these keys and delivers them into key-unique S3 prefixes. -
Does service support delivery streams tagging?
Yes it is. Service provides easy tagging functionality for delivery streams.