|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
Objectorg.apache.spark.rdd.RDD<T>
org.apache.spark.rdd.JdbcRDD<T>
public class JdbcRDD<T>
An RDD that executes an SQL query on a JDBC connection and reads results. For usage example, see test case JdbcRDDSuite.
param: getConnection a function that returns an open Connection. The RDD takes care of closing the connection. param: sql the text of the query. The query must contain two ? placeholders for parameters used to partition the results. E.g. "select title, author from books where ? <= id and id <= ?" param: lowerBound the minimum value of the first placeholder param: upperBound the maximum value of the second placeholder The lower and upper bounds are inclusive. param: numPartitions the number of partitions. Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, the query would be executed twice, once with (1, 10) and once with (11, 20) param: mapRow a function from a ResultSet to a single row of the desired result type(s). This should only call getInt, getString, etc; the RDD takes care of calling next. The default maps a ResultSet to an array of Object.
| Nested Class Summary | |
|---|---|
static interface |
JdbcRDD.ConnectionFactory
|
| Constructor Summary | |
|---|---|
JdbcRDD(SparkContext sc,
scala.Function0<java.sql.Connection> getConnection,
String sql,
long lowerBound,
long upperBound,
int numPartitions,
scala.Function1<java.sql.ResultSet,T> mapRow,
scala.reflect.ClassTag<T> evidence$1)
|
|
| Method Summary | ||
|---|---|---|
scala.collection.Iterator<T> |
compute(Partition thePart,
TaskContext context)
:: DeveloperApi :: Implemented by subclasses to compute a given partition. |
|
static JavaRDD<Object[]> |
create(JavaSparkContext sc,
JdbcRDD.ConnectionFactory connectionFactory,
String sql,
long lowerBound,
long upperBound,
int numPartitions)
Create an RDD that executes an SQL query on a JDBC connection and reads results. |
|
static
|
create(JavaSparkContext sc,
JdbcRDD.ConnectionFactory connectionFactory,
String sql,
long lowerBound,
long upperBound,
int numPartitions,
Function<java.sql.ResultSet,T> mapRow)
Create an RDD that executes an SQL query on a JDBC connection and reads results. |
|
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD. |
|
static Object[] |
resultSetToObjectArray(java.sql.ResultSet rs)
|
|
| Methods inherited from class Object |
|---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface org.apache.spark.Logging |
|---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
| Constructor Detail |
|---|
public JdbcRDD(SparkContext sc,
scala.Function0<java.sql.Connection> getConnection,
String sql,
long lowerBound,
long upperBound,
int numPartitions,
scala.Function1<java.sql.ResultSet,T> mapRow,
scala.reflect.ClassTag<T> evidence$1)
| Method Detail |
|---|
public static Object[] resultSetToObjectArray(java.sql.ResultSet rs)
public static <T> JavaRDD<T> create(JavaSparkContext sc,
JdbcRDD.ConnectionFactory connectionFactory,
String sql,
long lowerBound,
long upperBound,
int numPartitions,
Function<java.sql.ResultSet,T> mapRow)
connectionFactory - a factory that returns an open Connection.
The RDD takes care of closing the connection.sql - the text of the query.
The query must contain two ? placeholders for parameters used to partition the results.
E.g. "select title, author from books where ? <= id and id <= ?"lowerBound - the minimum value of the first placeholderupperBound - the maximum value of the second placeholder
The lower and upper bounds are inclusive.numPartitions - the number of partitions.
Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,
the query would be executed twice, once with (1, 10) and once with (11, 20)mapRow - a function from a ResultSet to a single row of the desired result type(s).
This should only call getInt, getString, etc; the RDD takes care of calling next.
The default maps a ResultSet to an array of Object.sc - (undocumented)
public static JavaRDD<Object[]> create(JavaSparkContext sc,
JdbcRDD.ConnectionFactory connectionFactory,
String sql,
long lowerBound,
long upperBound,
int numPartitions)
Object array. For usage example, see test case JavaAPISuite.testJavaJdbcRDD.
connectionFactory - a factory that returns an open Connection.
The RDD takes care of closing the connection.sql - the text of the query.
The query must contain two ? placeholders for parameters used to partition the results.
E.g. "select title, author from books where ? <= id and id <= ?"lowerBound - the minimum value of the first placeholderupperBound - the maximum value of the second placeholder
The lower and upper bounds are inclusive.numPartitions - the number of partitions.
Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2,
the query would be executed twice, once with (1, 10) and once with (11, 20)sc - (undocumented)
public Partition[] getPartitions()
RDD
public scala.collection.Iterator<T> compute(Partition thePart,
TaskContext context)
RDD
compute in class RDD<T>thePart - (undocumented)context - (undocumented)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||