AWS Kinesis

AWS Kinesis

Amazon Kinesis helps in collecting a large amount of data and process it. Using this service, we can stream the data economically. Various data sources such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications can be stored. This service also helps to monitor the data as it streams and responds quickly. Customers get the benefit of accuracy and current by collecting, storing, and analyzing the important data. This service reduces the workload and the cost by avoiding the use of expensive software and the corresponding infrastructure. AWS Kinesis helps to set up high capacity pipes which can collect and analyze the data very quickly. Systems must just stream the data to Kinesis to get it analyzedquickly. Kinesis service can be easily integrated with storage services such as Dynamo DB, Redshift, and S3

KinesisAws

Benefits

  • Rapid Processing

    AWS Kinesis helps to ingest, buffer, and process streaming data as needed. Insights can be derived very quickly.

    KinesisAws
  • Ease of Management

    Management of infrastructure are fully managed. So, it is very easy to use and maintain services like:

    KinesisAws KinesisAws KinesisAws KinesisAws
  • Scalability

    Amazon Kinesis can handle a significant amount of data and process it from many sources even the sources that are of low latency.

    KinesisAws

    Minimize the number of disconnected management tools in use, implement common processes and fully automate error-prone manual processes

Capabilities

  • Video Stream

    A video stream can be used to securely stream the data to AWS for analytics, machine learning, and other processing from connected devices.

    KinesisAws
  • Data Stream

    User can build applications that process data streams in real-time using popular stream processing frameworks.

    The following diagram illustrates the high-level architecture of Kinesis Data Streams. The producers continually push data to Kinesis Data Streams, and the consumers process the data in real time. Consumers (such as a custom application running on Amazon EC2 or an Amazon Kinesis Data Firehose delivery stream) can store their results using an AWS service such as Amazon DynamoDB, Amazon Redshift, or Amazon S3.

    Amazon Kinesis stream could be used to reliably maintain a real-time audit trail of every single financial transaction, generating real-time metrics, reports generation, optimizing the marketing spend, increase the responsive to clients and data producers can publish the data within seconds.

    Source: https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html

    KinesisAws
  • Data Firehouse
    KinesisAws

    Data streams can be easily loaded into AWS Datastores using Firehouse. This can also be used to capture and transform data.

  • Data Analytics
    KinesisAws

    Data Analytics is the easiest way to process data. With just the knowledge of SQL user can perform the task without having to learn any new programming language or framework.

Use cases

  • Video Analysis Applications

    Video data from homes, offices, factories, and public places can be easily streamed to AWS. It can be used for playback, security monitoring, face detection, machine learning, and other analytics.

    KinesisAws
  • Real-Time Applications

    Time period analytics can be performed on the historical data using batch processing in data warehouses or using distributed processing frameworks. Other common use cases for storing and processing the real-time dataare data lakes,data science, and machine learning. Large amount of streaming data can also be loaded into S3 data lakes using firehouse service. As new data flows thought the streams, the machine learning models could be refreshed for an accurate and consistent data output.

    KinesisAws
  • IoT Data analytics

    Kinesis can process streaming data from IoT devices like embedded sensors, television set-top boxes and security cameras. The data then used to send alerts based on the time period or further actions could be taken programmatically for any element exceeding boundaries of operation thresholds.

    KinesisAws

Case Studies

  • A Real Estate Listing Company : Find a Realtor
    • Problem Statement:

      Ensure that the data integrity is maintained, and latest changes related to agent and profile information are reflected within a short span of time.

      Due to the on-premise setup, performance was a challenge and the response time was ~ 100 ms

    • Solution/Approach:

      Provided an API (FAR-BACKEND API) which provides search facility for agent/team/office for any city/state/postal- code/name.

      After collecting data from various APIs storing them in the DynamoDB and Elasticsearch to make search faster. Below are the different components and their purpose:

      • Created DynamoDB and stored Agent/Team/Office and their properties details in DynamoDB
      • Created Elasticsearch to store data and query the data to make search faster.
      • Created a Lambda function which will get invoked by DynamoDB stream, so whenever any data gets added/updated/deleted in the DynamoDB this lambda is invoked. This lambda is used to update/add/delete the data in the Elasticsearch
      • Created a Lambda function which will listen the above kinesis stream and call the Document-Builder (API which does CRUD operations on Agent/Team/Office in DynamoDB) to add/update/delete the data to and from the DynamoDB.
      • Created S3 bucket to store the Lambda code and update the Elasticsearch.
      • Created an ECS (Scheduler) which will run continuously to update the agent/team/office information in the DynamoDB/Elasticsearch
      • Created an ECS (Final FAR-Backend API) Which provides search facility for agent/team/office for any city/state/postal-code/name.
    • Outcome:

      With this solution all API’s are now migrated to cloud from the on-premise data center.

      Existing API response time was ~100ms and new API response time using the AWS Cloud is around ~25ms. This is a drastic improvement in performance which has helped to quicken the Search page results to the end users.

      Leverage all the AWS services and the primary focus is towards Autoscaling and Maintainability

  • A Major Pharmaceutical Company: Log Standardization
    • Problem Statement:

      In an Enterprise handling sensitive clinical information with disparate sources spread across different divisions, enforcing measures of security and governance is a challenge.

      In the current landscape,

      • Unified view of security and utilization aspects of resources is Lacking
      • Challenge to access Logs which is available across multiple systems including OnpremiseVMs,AWS Components.
      • Separate monitoring and alerting systems required
      • Limited Capability to storing large number of logs is
      • Limited Capability to storing large number of logs is
      • Getting Information on Real Time is difficult.
    • Solution/Approach:

      A Framework to collect, organize and enable analysis of logs

      Data Streams Ingestion
      • Streaming data from on premise and cloud VMs through Fluentd Agent
      • Event Logs subscribed to get a real time feed of log events
      • Kinesis Streams to ingest, process in shards & trigger Lambda function
      • Kinesis Analytics to process streaming data in real time with standard SQL
      • Scales automatically to match the volume and throughput rate of your incoming data
      Log Standardization
      • Gather data from various sources and drop it in S3 data lake
      • Create Tables on the Glue Catalog with the metadata information on the data
      • Create Dataformat including Parquet for faster query results
      • Creates Partitions and organise the data to efficiently analyse the data
      Analytics & Monitoring
      • CloudWatch collects & keeps track of all performance metrics & generates alerts
      • It takes in data from Lambda & Firehose to enable the customer to go from raw data to actionable insights quickly
      • Business Users can get the analytical insights from AWS Athena & can also query from ElasticSearch
    • Outcome:

      With the availability of Logs real-time in a single location, the client is able to enhance the capability of analysis on the below aspects

      • Realtime utilization of Resources
      • Optimize Infra Cost and manage efficiently
      • Report Security Breaches quickly
      • Resolving Troubleshooting Issues
LinkedIn Instagram Facebook Twitter