Spark Streaming¶
Core Classes¶
| 
 | Main entry point for Spark Streaming functionality. | 
| 
 | A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see  | 
Streaming Management¶
| Add a [[org.apache.spark.streaming.scheduler.StreamingListener]] object for receiving system events related to streaming. | |
| 
 | Wait for the execution to stop. | 
| Wait for the execution to stop. | |
| 
 | Sets the context to periodically checkpoint the DStream operations for master fault-tolerance. | 
| Return either the currently active StreamingContext (i.e., if there is a context started but not stopped) or None. | |
| Either return the active StreamingContext (i.e. | |
| 
 | Either recreate a StreamingContext from checkpoint data or create a new StreamingContext. | 
| 
 | Set each DStreams in this context to remember RDDs it generated in the last given duration. | 
| Return SparkContext which is associated with this StreamingContext. | |
| Start the execution of the streams. | |
| 
 | Stop the execution of the streams, with option of ensuring all received data has been processed. | 
| 
 | Create a new DStream in which each RDD is generated by applying a function on RDDs of the DStreams. | 
| 
 | Create a unified DStream from multiple DStreams of the same type and same slide duration. | 
Input and Output¶
| Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as flat binary files with records of fixed length. | |
| 
 | Create an input stream from a queue of RDDs or list. | 
| 
 | Create an input from TCP source hostname:port. | 
| 
 | Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files. | 
| 
 | Print the first num elements of each RDD generated in this DStream. | 
| 
 | Save each RDD in this DStream as at text file, using string representation of elements. | 
Transformations and Actions¶
| Persist the RDDs of this DStream with the default storage level (MEMORY_ONLY). | |
| 
 | Enable periodic checkpointing of RDDs of this DStream | 
| 
 | Return a new DStream by applying ‘cogroup’ between RDDs of this DStream and other DStream. | 
| 
 | Return a new DStream by applying combineByKey to each RDD. | 
| Return the StreamingContext associated with this DStream | |
| Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream. | |
| Return a new DStream in which each RDD contains the counts of each distinct value in each RDD of this DStream. | |
| 
 | Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream. | 
| 
 | Return a new DStream in which each RDD has a single element generated by counting the number of elements in a window over this DStream. | 
| Return a new DStream containing only the elements that satisfy predicate. | |
| 
 | Return a new DStream by applying a function to all elements of this DStream, and then flattening the results | 
| Return a new DStream by applying a flatmap function to the value of each key-value pairs in this DStream without changing the key. | |
| 
 | Apply a function to each RDD in this DStream. | 
| 
 | Return a new DStream by applying ‘full outer join’ between RDDs of this DStream and other DStream. | 
| Return a new DStream in which RDD is generated by applying glom() to RDD of this DStream. | |
| 
 | Return a new DStream by applying groupByKey on each RDD. | 
| 
 | Return a new DStream by applying groupByKey over a sliding window. | 
| 
 | Return a new DStream by applying ‘join’ between RDDs of this DStream and other DStream. | 
| 
 | Return a new DStream by applying ‘left outer join’ between RDDs of this DStream and other DStream. | 
| 
 | Return a new DStream by applying a function to each element of DStream. | 
| 
 | Return a new DStream in which each RDD is generated by applying mapPartitions() to each RDDs of this DStream. | 
| 
 | Return a new DStream in which each RDD is generated by applying mapPartitionsWithIndex() to each RDDs of this DStream. | 
| Return a new DStream by applying a map function to the value of each key-value pairs in this DStream without changing the key. | |
| 
 | Return a copy of the DStream in which each RDD are partitioned using the specified partitioner. | 
| 
 | Persist the RDDs of this DStream with the given storage level | 
| 
 | Return a new DStream in which each RDD has a single element generated by reducing each RDD of this DStream. | 
| 
 | Return a new DStream by applying reduceByKey to each RDD. | 
| 
 | Return a new DStream by applying incremental reduceByKey over a sliding window. | 
| 
 | Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream. | 
| 
 | Return a new DStream with an increased or decreased level of parallelism. | 
| 
 | Return a new DStream by applying ‘right outer join’ between RDDs of this DStream and other DStream. | 
| 
 | Return all the RDDs between ‘begin’ to ‘end’ (both included) | 
| 
 | Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream. | 
| 
 | Return a new DStream in which each RDD is generated by applying a function on each RDD of this DStream and ‘other’ DStream. | 
| 
 | Return a new DStream by unifying data of another DStream with this DStream. | 
| 
 | Return a new “state” DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key. | 
| 
 | Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream. | 
Kinesis¶
| 
 | Create an input stream that pulls messages from a Kinesis stream. |