The underlying JVM object is a SchemaRDD, not a PythonRDD, so we can
utilize the relational query api exposed by SparkSQL.
This class receives raw tuples from Java but assigns a class to it in
all its data-collection methods (mapPartitionsWithIndex, collect, take,
etc) so that PySpark sees them as Row objects with named fields.
|
|
__init__(self,
jschema_rdd,
sql_ctx)
x.__init__(...) initializes x; see help(type(x)) for signature |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
insertInto(self,
tableName,
overwrite=False)
Inserts the contents of this SchemaRDD into the specified table. |
source code
|
|
|
|
saveAsTable(self,
tableName)
Creates a new table with the contents of this SchemaRDD. |
source code
|
|
|
|
|
|
|
schemaString(self)
Returns the output schema in the tree format. |
source code
|
|
|
|
printSchema(self)
Prints out the schema in the tree format. |
source code
|
|
|
|
|
|
|
|
|
|
mapPartitionsWithIndex(self,
f,
preservesPartitioning=False)
Return a new RDD by applying a function to each partition of this
RDD, while tracking the index of the original partition. |
source code
|
|
|
|
cache(self)
Persist this RDD with the default storage level
(MEMORY_ONLY_SER). |
source code
|
|
|
|
persist(self,
storageLevel)
Set this RDD's storage level to persist its values across operations
after the first time it is computed. |
source code
|
|
|
|
unpersist(self,
blocking=True)
Mark the RDD as non-persistent, and remove all blocks for it from
memory and disk. |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
coalesce(self,
numPartitions,
shuffle=False)
Return a new RDD that is reduced into `numPartitions` partitions. |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
subtract(self,
other,
numPartitions=None)
Return each value in self that is not contained in
other. |
source code
|
|
|
Inherited from rdd.RDD:
__add__,
__repr__,
aggregate,
aggregateByKey,
cartesian,
cogroup,
collectAsMap,
combineByKey,
context,
countByKey,
countByValue,
filter,
first,
flatMap,
flatMapValues,
fold,
foldByKey,
foreach,
foreachPartition,
getNumPartitions,
getStorageLevel,
glom,
groupBy,
groupByKey,
groupWith,
histogram,
id,
join,
keyBy,
keys,
leftOuterJoin,
map,
mapPartitions,
mapPartitionsWithSplit,
mapValues,
max,
mean,
min,
name,
partitionBy,
pipe,
reduce,
reduceByKey,
reduceByKeyLocally,
rightOuterJoin,
sample,
sampleByKey,
sampleStdev,
sampleVariance,
saveAsHadoopDataset,
saveAsHadoopFile,
saveAsNewAPIHadoopDataset,
saveAsNewAPIHadoopFile,
saveAsPickleFile,
saveAsSequenceFile,
saveAsTextFile,
setName,
sortBy,
sortByKey,
stats,
stdev,
subtractByKey,
sum,
take,
takeOrdered,
takeSample,
toDebugString,
top,
union,
values,
variance,
zip,
zipWithIndex,
zipWithUniqueId
Inherited from object:
__delattr__,
__format__,
__getattribute__,
__hash__,
__new__,
__reduce__,
__reduce_ex__,
__setattr__,
__sizeof__,
__str__,
__subclasshook__
|