Stream Processing
CategoryArchitecture Component
Stream Processing is the technique of applying mathematical algorithms to the event data which is transformed and joined in real time, for system control and status visualization. Predictive Modeling can also be applied to the input data, to forecast probable outcomes.
Component Overview
Low-latency Event Handling
Stream Processing is a technology that allows to query continuous data streams and detect boundary conditions shortly after the time of receiving the data, i.e. without significant lag behind the input data feed.
More traditional techniques of Data Mining and Batch Processing let the data to build up, and then use batches to aggregate and transform them to much smaller, meaningful results. In contrast, Stream Processing handles never-ending incoming data feeds in real time, where it is too long to wait until sufficient amount of data is accumulated, or not even possible to store it given the volume or time window for reacting to event occurrences.
Modern analytical applications targeted at Big Data provide Batch Processing capability allowing to execute scripted logic against large batches — a pattern widely adopted in the Data Warehousing space. On the other hand, the low-latency nature of a Stream Processing engine makes this pattern applicable to core applications that directly power operations of the business and handle live customer communications.
While implementing a stream processor logic, one can write a query that matches certain data pattern. The processor will continually search the stream for series of events that match the query, and will generate a notification whenever match is found. Streaming queries are similar to database SQL queries, except the former can be executed against a never-ending event feed, where the results are generated on continuous basis. One example of stream querying is doing full-text search on a stream, whereby a search query is registered in advance, and the streaming engine triggers a notification whenever an event matches the pre-defined search criteria.
Stream Processing can be interpreted as a real-time version of Event Analytics, where streams of events are stored and continuously processed. It adds a powerful abstraction layer on top of traditional queuing capability provided by a Message Broker, and allows to parallelize processing across multiple compute nodes. In this case, a stream is considered to be a single topic, regardless of the number of partitions, continuously moving messages from producers to consumers. Specifically applicable to Event Stream Processing (ESP), such messages carry metadata about the type of activity, when the activity occurred, as well as its source and location.
Some examples of modern Stream Processing systems under the umbrella of the Apache Software Foundation are Kafka Streams, Spark Structured Streaming, Samza, Storm, Flink, Apex and Beam.
stream
Stream Processing is a method to quickly analyze the data flow from one device to another at an almost instantaneous rate after the data has been generated.
A stream is a constant and continuous flow of data entities between sources and destinations, referred to publishers and subscribers in messaging discipline.
Processing is the final act of analyzing data entities, in order to identify a pattern match or generate business-meaningful indicators.
Stream Processing can be used in any industry and business scenario where large volume of data is generated, whether it be from people, computer systems or sensors.