Amazon Kinesis Data Streams
Amazon Kinesis Data Streams - service which simplifies the processing, storage, and capture of data at any scale, making it easier to handle large streams of data. Kinesis Data Streams has the capability to consistently capture large volumes of data, reaching gigabytes per second, from numerous sources including website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The collected data is immediately accessible, facilitating the utilization of real-time analytics applications such as real-time dashboards, anomaly detection, and dynamic pricing. The low latency of the data access ensures prompt processing for immediate insights and actionable information.
Key definitions for Amazon Kinesis Data Streams:
-
Low latency
Average latency is about 70 ms.
-
Dedicated throughput per consumer
Up to 20 consumers can be attached to the data stream, each with its own dedicated read throughput.
-
Secure and compliant
Access via VPC, data encryption at transfer and rest using server-side encryption and AWS Key Management Service (KMS) keys.
-
Highly available and durable
Achieve data loss protection by synchronously replicating your streaming data across three Availability Zones (AZs) within an AWS Region. Store this data for up to 365 days, ensuring multiple layers of data resilience.
-
Serverless
No server management required. It automatically scales capacity based on workload traffic.
-
Integrated with other AWS services
Excellent integration with other services from the Kinesis family as well as with other popular AWS services.
-
On-demand or provisioned capacity mode
Is a scalable service designed to handle large amounts of streaming data from various sources. AWS introduced the on-demand capacity mode for Kinesis Data Streams, which significantly simplifies the management of data stream capacity for developers and businesses.
-
Streams consist of shards
- A shard comprises a sequential arrangement of records that are ordered based on their arrival time.
- One shard can ingest up to 1MB/sec of data records.
- Dynamic change of shards count in reaction to throughput changes can be achieved using UpdateShardCount API, the AWS console, via AWS Lambda, or by auto scaling utility.
- For enhanced fan-out, one shard provides 1MB/sec data input and 2MB/sec data output for each consumer.
- For non enhanced fan-out, a shard provides 1MB/sec of input and 2MB/sec of data output. Output is shared between all registered consumers.
- Number of shards should be specified during stream creation and can be dynamically changed in future.
- Service provides shard-level monitoring of metrics.
-
Data record
A record is the unit of data stored in a stream. A record is composed of a partition key, data blob, and sequence number. The data blob max size (after Base64-decoding) is 1MB.
-
Partition key
The partition key, often a meaningful identifier like a user ID or timestamp, serves multiple purposes. Consumers benefit from the partition key as it allows them to replay or construct a history associated with that key. Additionally, the partition key is used to segregate and direct data records to different shards of a stream.
-
Sequence number
An ID for each record.
Service integrates with:
-
Amazon DynamoDB
-
AWS Glue
-
AWS CloudTrail
-
Amazon Kinesis Data Firehose
-
Amazon Simple Queue Service (Amazon SQS)
-
AWS Lambda
-
Amazon Simple Notification Service (Amazon SNS)
-
Amazon Key Management Service (Amazon KMS)
-
Amazon CloudWatch
-
Amazon RedShift
-
Amazon RDS
-
Amazon Aurora
-
AWS Identity and Access Management (IAM)
-
Amazon Kinesis Data Analytics
Usage use cases
-
Stream log and event data.
Simple and effective collection and ingestion of terabytes of data per day from clickstream data, sensor data, application and service logs, and in-app user events to generate metrics, power live dashboards, and deliver data into data lakes.
-
Run real-time analytics.
For high-frequency event data such as clickstream data, and get insights in seconds, not days, using Amazon Kinesis Data Analytics or AWS Lambda.
-
Power event-driven applications.
Can be used as an event trigger for AWS Lambda.
FAQ for Amazon Kinesis Data Streams
-
What is the throughput unit of the data stream in AWS Kinesis Data Stream?
The throughput unit of stream is shard. -
How servers are managed in Amazon Kinesis Data Stream?
There is no server management at all. Service is fully serverless. -
What amount of consumers can be attached to a specific data stream?
Up to 20 consumers can be attached per single data stream.