There are many reasons companies or individuals would want to analyze data streams. The overriding theme in this book is the analysis of brain wave measurements collected through a BCI while performing different activities, referred to as scenarios. The primary objective is to use exploratory data analysis (EDA) to find patterns that are specific to each scenario, then use those patterns to identify a scenario based on brain wave patterns in real time. In addition to that objective, the collected data is useful for additional purposes like testing new devices to determine if the reading is recorded the same. For example, in a meditation scenario, the ALPHA reading should be somewhere between 4.3924 and 5.0287. If this is not the case on a different device or when a software version on the current device is updated, then that would be a problem. That would mean that once the device changes or is updated, the data collected is not really useful for determining the scenario in real time anymore, since the BCI captures the wavelengths differently. Anomaly detection is another scenario where the collection, real‐time streaming, and analysis of data are beneficial. From a brain wave perspective, after enough EDA is performed to find measurements that define a healthy brain, when readings fall outside that range, it might highlight some damage. Learning this might result in some actions taken to get the brain back into an expected state. Monitoring stock price or network traffic are also areas where real‐time anomaly detection can be worthwhile. The former may earn some money, and the latter may identify and halt malicious activities.
Regardless of which scenario your data stream processing solution is expected to perform, there are some common considerations that apply. The considerations are common because the requirements of a data stream processing solution are mostly the same. In every scenario you need an ingestion point specifically designed to handle a high velocity level of incoming data. You require a tool to transform, filter, and aggregate that data in real time. Finally, you need a place to store and view the insights found from the data stream. The following section provides the details required to design and develop such a solution. In addition, there are some explanations of concepts that apply specifically to data streaming solutions that can help you better ingest, transform, analyze, and deliver the data for consumption.
Design a Stream Processing Solution
Before you begin piecing together the different Azure products for your data stream processing solution, you should first spend some time identifying your requirements. Choosing the products for running your data stream processing solution comes toward the end of your design phase. After identifying the constraints for your solution, the list of pertinent products will be much smaller. Also, by defining these constraints, you may uncover other products you might not have considered but which turn out to be a better fit. For example, Azure Functions, Azure Service Bus, or Azure App Service WebJobs have not come into the discussion in the data streaming context but do provide some streaming capabilities.