Apache Spark Streaming is an extension of the open
source Apache Spark platform that makes it easy to build scalable fault-tolerant
streaming applications. SiteWhere support includes a custom receiver that streams events
from a SiteWhere instance via Hazelcast. The event stream can then be manipulated via the standard
Spark Streaming APIs and used as the input for machine learning
and graph processing modules available in Spark.
Create a Spark Project
In order to deploy code to be executed on Spark via spark-submit, an Uber JAR must be
created containing the dependencies needed for it to run. The pom.xml used by Maven
to build the project should include dependencies on the SiteWhere Spark module and
the Apache Spark libraries:
Note that the Spark libraries are marked as provided since the Spark engine will make them available.
To create the Uber JAR, an extra plugin needs to be added to the Maven build as shown below:
The excludes block prevents unneeded libraries from being included in the JAR.
Add Stream Processing Logic
A Java class with a main method should be created to supply the logic that will be
executed in Spark. The line needed to stream SiteWhere events into Spark is given below:
The receiver will connect to SiteWhere via Hazelcast and stream all measurements, locations, and
alerts for use by the other Spark APIs. An example that counts the number of events processed for each
device assignment token is shown below:
Once the logic has been created, run the Maven build by executing:
mvn clean install
The output will be a JAR containing everything needed for Spark to execute the logic.