|
|||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||
R, may be different from the element type being added, T.Accumulable shared variable of the given type, to which tasks
can "add" values with add.
Accumulable shared variable of the given type, to which tasks
can "add" values with add.
Accumulable shared variable, to which tasks can add values
with +=.
Accumulable shared variable, with a name for display in the
Spark UI.
Accumulable modified during a task or stage.Accumulable where the result type being accumulated is the same
as the types of elements being merged, i.e.Accumulator integer variable, which tasks can "add" values
to using the add method.
Accumulator integer variable, which tasks can "add" values
to using the add method.
Accumulator double variable, which tasks can "add" values
to using the add method.
Accumulator double variable, which tasks can "add" values
to using the add method.
Accumulator variable of a given type, which tasks can "add"
values to using the add method.
Accumulator variable of a given type, which tasks can "add"
values to using the add method.
Accumulator variable of a given type, which tasks can "add"
values to using the += method.
Accumulator variable of a given type, with a name for display
in the Spark UI.
AccumulableParam where the only data type you can add
in is the same type as the accumulated value.StreamingListener object for
receiving system events related to streaming.
StreamingListener object for
receiving system events related to streaming.
DataFrame without groups.
DataFrame without groups.
DataFrame without groups.
DataFrame without groups.
DataFrame without groups.
messages that have the same ids using reduceFunc, returning a
VertexRDD co-indexed with this.
true iff both left or right evaluate to true.1.0 (bias) appended to the input vector.
VertexRDD (one that is not set up for efficient joins with an
EdgeRDD) from an RDD of vertex-attribute pairs.
VertexRDD from an RDD of vertex-attribute pairs.
VertexRDD from an RDD of vertex-attribute pairs.
Column.
ArrayType object with the given element type.
MapType object with the given key type and value type.
StructField of the given name.
StructType containing StructFields of the given names, preserving the
original order of fields.
BlockManagerId for the given configuration.
StructField of type array.
DataFrame with an alias set.
DataFrame with an alias set.
AttributeType$.Numeric, AttributeType$.Nominal,
and AttributeType$.Binary.awaitTerminationOrTimeout(Long).
awaitTerminationOrTimeout(Long).
StructField of type binary.
numBins to 0.
Array[Byte] values.Param[Boolean] for Java.Boolean values.GradientBoostedTrees.Broadcast object for reading it in distributed functions.
Broadcast object for reading it in distributed functions.
Bucketizer maps a column of continuous features to a column of feature buckets.Metadata instance.
RDD[Row] containing all rows within
this relation.
RDD[Row] containing all rows within
this relation.
RDD[Row] containing all rows within
this relation.
Byte values.MEMORY_ONLY.
this and b is in other.
this and b is in other.
1 / observed.size.
addFile so that they do not get downloaded to
any new nodes.
addFile so that they do not get downloaded to
any new nodes.
addJar so that they do not get downloaded to
any new nodes.
addJar so that they do not get downloaded to
any new nodes.
predict will output raw prediction scores.
predict will output raw prediction scores.
OutputWriter.
numPartitions partitions.
numPartitions partitions.
numPartitions partitions.
numPartitions partitions.
numPartitions partitions.
numPartitions partitions.
numPartitions partitions.
DataFrame that has exactly numPartitions partitions.
this or other, return a resulting RDD that contains a tuple with the
list of values for that key in this as well as other.
this or other1 or other2, return a resulting RDD that contains a
tuple with the list of values for that key in this, other1 and other2.
this or other1 or other2 or other3,
return a resulting RDD that contains a tuple with the list of values
for that key in this, other1, other2 and other3.
this or other, return a resulting RDD that contains a tuple with the
list of values for that key in this as well as other.
this or other1 or other2, return a resulting RDD that contains a
tuple with the list of values for that key in this, other1 and other2.
this or other1 or other2 or other3,
return a resulting RDD that contains a tuple with the list of values
for that key in this, other1, other2 and other3.
this or other, return a resulting RDD that contains a tuple with the
list of values for that key in this as well as other.
this or other1 or other2, return a resulting RDD that contains a
tuple with the list of values for that key in this, other1 and other2.
this or other1 or other2 or other3,
return a resulting RDD that contains a tuple with the list of values
for that key in this, other1, other2 and other3.
this or other1 or other2 or other3,
return a resulting RDD that contains a tuple with the list of values
for that key in this, other1, other2 and other3.
this or other, return a resulting RDD that contains a tuple with the
list of values for that key in this as well as other.
this or other1 or other2, return a resulting RDD that contains a
tuple with the list of values for that key in this, other1 and other2.
this or other, return a resulting RDD that contains a tuple with the
list of values for that key in this as well as other.
this or other1 or other2, return a resulting RDD that contains a
tuple with the list of values for that key in this, other1 and other2.
this or other1 or other2 or other3,
return a resulting RDD that contains a tuple with the list of values
for that key in this, other1, other2 and other3.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
Column.
Column based on the given column name.
f.
Rows in this DataFrame.
Rows in this DataFrame.
collect, which returns a future for
retrieving an array containing all of the elements in this RDD.
DataFrame.Column based on the given column name.
FutureAction for actions that could trigger multiple Spark jobs.RDD[(VertexId, VD)] equivalent output.
A^T A.
SparkContext that this RDD was created on.
SparkContext that this RDD was created on.
StreamingContext associated with this DStream
Row object.
corr()
corr()
DataFrame.
count, which returns a
future for counting the number of elements in this RDD.
Row from the given arguments.
elementType).
elementType) and
whether the array contains null values (containsNull).
write().jdbc().
keyType) and values
(keyType).
keyType), the data type of
values (keyType), and whether values contain any null value
(valueContainsNull).
name), data type (dataType) and
whether values of this field can be null values (nullable).
fields).
fields).
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrames.DataFrame from external storage systems (e.g.DataFrames.DataFrame to external storage systems (e.g.StructField of type date.
java.sql.Date values.StructField of type decimal.
StructField of type decimal.
Decision tree model for classification.Decision tree learning algorithm
for classification.Decision tree model for regression.Decision tree learning algorithm
for regression.JavaSparkContext.defaultMinPartitions() instead
DecisionTree
DecisionTree
DenseMatrix format from the supplied values.
Matrix format from the supplied values.
this and other, diff returns only those vertices with
differing values; for values that are different, keeps the values from other.
this and other, diff returns only those vertices with
differing values; for values that are different, keeps the values from other.
DataFrame that contains only the unique rows from this DataFrame.
Accumulator double variable, which tasks can "add" values
to using the add method.
Accumulator double variable, which tasks can "add" values
to using the add method.
Param[Array[Double} for Java.Param[Double] for Java.Double values.DataFrame with a column dropped.
DataFrame with a column dropped.
DataFrame that drops rows containing any null values.
DataFrame that drops rows containing null values.
DataFrame that drops rows containing any null values
in the specified columns.
DataFrame that drops rows containing any null values
in the specified columns.
DataFrame that drops rows containing null values
in the specified columns.
DataFrame that drops rows containing null values
in the specified columns.
DataFrame that drops rows containing less than minNonNulls non-null values.
DataFrame that drops rows containing less than minNonNulls non-null
values in the specified columns.
DataFrame that drops rows containing less than
minNonNulls non-null values in the specified columns.
DataFrame that contains only the unique rows from this DataFrame.
DataFrame with duplicate rows removed, considering only
the subset of columns.
DataFrame with duplicate rows removed, considering only
the subset of columns.
EdgeRDD[ED, VD] extends RDD[Edge[ED} by storing the edges in columnar format on each
partition for performance.DataFrame with no rows or columns.
entropy during
binary classification.true iff the attribute evaluates to a value
equal to value.DataFrame containing rows in this frame but not in another frame.
DataFrame where each row has been expanded to zero or more
rows by the provided function.
DataFrame where a single column has been expanded to zero
or more rows by the provided function.
RandomRDDs.exponentialRDD(org.apache.spark.SparkContext, double, long, int, long).
RandomRDDs.exponentialJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long) with the default seed.
RandomRDDs.exponentialJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long) with the default number of partitions and the default seed.
RandomRDDs.exponentialVectorRDD(org.apache.spark.SparkContext, double, long, int, int, long).
RandomRDDs.exponentialJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long) with the default seed.
RandomRDDs.exponentialJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long) with the default number of partitions
and the default seed.
i.i.d. samples from the exponential distribution with
the input mean.
i.i.d. samples drawn from the
exponential distribution with the input mean.
extractParamMap with no extra values.
DenseMatrix format.
Matrix format.
DataFrame that replaces null values in numeric columns with value.
DataFrame that replaces null values in string columns with value.
DataFrame that replaces null values in specified numeric columns.
DataFrame that replaces null values in specified
numeric columns.
DataFrame that replaces null values in specified string columns.
DataFrame that replaces null values in
specified string columns.
DataFrame that replaces null values.
DataFrame that replaces null values.
lower to upper.
PCAModel that contains the principal components of the input vectors.
fit()
DataFrame,
and then flattening the results.
Param[Float] for Java.Float values.f to all rows.
foreachRDD.
foreachRDD.
f to all the active elements of dense and sparse matrix.
f to all the active elements of dense and sparse vector.
foreach action, which
applies a function f to all the elements of this RDD.
DataFrame.
foreachPartition action, which
applies a function f to each partition of this RDD.
0.3, numPartitions: same
as the input data}.
DataType.fromJson()
SparseMatrix from Coordinate List (COO) format.
DStream to a Java-friendly
JavaDStream.
VertexRDD containing all vertices referred to in edges.
InputDStream to a Java-friendly
JavaInputDStream.
InputDStream of pairs to a
Java-friendly JavaPairInputDStream.
AttributeType object from its name.
ReceiverInputDStream to a Java-friendly
JavaReceiverInputDStream.
ReceiverInputDStream to a Java-friendly
JavaReceiverInputDStream.
StructField instance.
this and other.
this and other.
this and other.
this and other.
this and other.
this and other.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
RandomRDDs.gammaRDD(org.apache.spark.SparkContext, double, double, long, int, long).
RandomRDDs.gammaJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long) with the default seed.
RandomRDDs.gammaJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long) with the default number of partitions and the default seed.
RandomRDDs.gammaVectorRDD(org.apache.spark.SparkContext, double, double, long, int, int, long).
RandomRDDs.gammaJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long) with the default seed.
RandomRDDs.gammaJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long) with the default number of partitions and the default seed.
i.i.d. samples from the gamma distribution with the input
shape and scale.
i.i.d. samples drawn from the
gamma distribution with the input shape and scale.
Gradient-Boosted Trees (GBTs)
model for classification.Gradient-Boosted Trees (GBTs)
learning algorithm for classification.Gradient-Boosted Trees (GBTs)
learning algorithm for regression.SparkContext.addFile().
getDocConcentration
getTopicConcentration
StructType.
ordinal out of an array,
or gets a value by key key in a MapType.
Map.
null if the job info could not be found or was garbage collected.
None if the job info could not be found or was garbage collected.
List.
numValues or from values.
getOrCreate without JavaStreamingContextFactor.
getOrCreate without JavaStreamingContextFactor.
getOrCreate without JavaStreamingContextFactor.
SparkContext.addFile().
null if the stage info could not be found or was
garbage collected.
None if the stage info could not be found or was
garbage collected.
Row object.
Gini impurity
during binary classification.Stochastic Gradient Boosting
for regression and binary classification.Graph to support computation on graphs.Graphs from files.Graph.GraphOps member from a graph.
true iff the attribute evaluates to a value
greater than value.true iff the attribute evaluates to a value
greater than or equal to value.rows by cols grid graph with each vertex connected to its
row+1 and col+1 neighbors.
DataFrame using the specified columns, so we can run aggregation on them.
DataFrame using the specified columns, so we can run aggregation on them.
DataFrame using the specified columns, so we can run aggregation on them.
DataFrame using the specified columns, so we can run aggregation on them.
groupByKey to each RDD.
groupByKey to each RDD.
groupByKey on each RDD of this DStream.
groupByKey to each RDD.
groupByKey to each RDD.
groupByKey on each RDD.
groupByKey over a sliding window.
groupByKey over a sliding window.
groupByKey over a sliding window on this DStream.
groupByKey over a sliding window on this DStream.
groupByKey over a sliding window.
groupByKey over a sliding window.
groupByKey over a sliding window on this DStream.
groupByKey over a sliding window on this DStream.
DataFrame, created by DataFrame.groupBy.BaseRelation that provides much of the common code required for formats that store their
data to an HDFS compatible filesystem.org.apache.hadoop.mapred).Partitioner that implements hash-based partitioning using
Java's Object.hashCode.OffsetRanges.Model has a corresponding parent.
n rows.
BroadcastFactory implementation that uses a
HTTP server as the broadcast mechanism.sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
sqrt(a^2^ + b^2^) without intermediate overflow or underflow.
true iff the attribute evaluates to one of the values in the array.IndexedRowMatrix.PartitionStrategy.
inRange() which uses inclusive be default: [lowerBound, upperBound]
write().mode(SaveMode.Append|SaveMode.Overwrite).saveAsTable(tableName).
write().mode(SaveMode.Append).saveAsTable(tableName).
DataFrame to the specified table.
write().jdbc().
Accumulator integer variable, which tasks can "add" values
to using the add method.
Accumulator integer variable, which tasks can "add" values
to using the add method.
Int values.DataFrame containing rows only in both this frame and another frame.
Param[Int] for Java.InformationGainStats object to
denote that current split doesn't satisfies minimum info gain or
minimum number of instances per node.
collect and take methods can be run locally
(without any Spark executors).
NominalAttribute and BinaryAttribute.
true iff the attribute evaluates to a non-null value.true iff the attribute evaluates to null.NumericAttribute and BinaryAttribute.
spark.*.port or spark.port.*.
categoryMaps
DStream, the basic
abstraction in Spark Streaming that represents a continuous stream of data.InputDStream.reduceByKey and join.InputDStream of
key-value pairs.ReceiverInputDStream, the
abstract class for defining any input stream that receives data over the network.Params.DataFrame as a JavaRDD of Rows.
ReceiverInputDStream, the
abstract class for defining any input stream that receives data over the network.SparkContext that returns
JavaRDDs and works with Java collections instead of Scala ones.StreamingContext which is the main
entry point for Spark Streaming functionality.topicDistributions
DataFrame representing the database table accessible via JDBC URL
url named table and connection properties.
DataFrame representing the database table accessible via JDBC URL
url named table.
DataFrame representing the database table accessible via JDBC URL
url named table using connection properties.
DataFrame to a external database table via JDBC.
read().jdbc().
read().jdbc().
read().jdbc().
DataFrame.this and other.
this and other.
this and other.
this and other.
this and other.
DataFrame.
DataFrame using the given column.
DataFrame, using the given join expression.
DataFrame, using the given join expression.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
this DStream and other DStream.
DataFrame.
JavaRDD[String] storing JSON objects (one object per record) and
returns the result as a DataFrame.
RDD[String] storing JSON objects (one object per record) and
returns the result as a DataFrame.
DataFrame in JSON format at the specified path.
read().json().
read().json().
read().json().
read().json().
read().json().
read().json().
read().json().
read().json().
read().json().
f.
f.
Kryo serialization library.offset rows before the current row, and
null if there is less than offset rows before the current row.
offset rows before the current row, and
null if there is less than offset rows before the current row.
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row.
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row.
offset rows after the current row, and
null if there is less than offset rows after the current row.
offset rows after the current row, and
null if there is less than offset rows after the current row.
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row.
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row.
this and other.
this and other.
this and other.
this and other.
this and other.
this and other.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
true iff the attribute evaluates to a value
less than value.true iff the attribute evaluates to a value
less than or equal to value.DataFrame by taking the first n rows.
LinearRegression.Column of literal value.
DataFrame, for data sources that require a path (e.g.
DataFrame, for data sources that don't require a path (e.g.
read().load(path).
read().format(source).load(path).
read().format(source).options(options).load().
read().format(source).options(options).load().
read().format(source).schema(schema).options(options).load().
read().format(source).schema(schema).options(options).load().
RDD.saveAsTextFile(java.lang.String) for saving and
MLUtils.loadLabeledPoints(org.apache.spark.SparkContext, java.lang.String, int) for loading.
RDD[LabeledPoint].saveAsTextFile.
RDD[LabeledPoint].saveAsTextFile with the default number of
partitions.
RDD[Vector].saveAsTextFile.
RDD[Vector].saveAsTextFile with the default number of partitions.
LogisticRegression.LogisticRegressionModel with weights and intercept for binary classification.
RandomRDDs.logNormalRDD(org.apache.spark.SparkContext, double, double, long, int, long).
RandomRDDs.logNormalJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long) with the default seed.
RandomRDDs.logNormalJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long) with the default number of partitions and the default seed.
RandomRDDs.logNormalVectorRDD(org.apache.spark.SparkContext, double, double, long, int, int, long).
RandomRDDs.logNormalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long) with the default seed.
RandomRDDs.logNormalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long) with the default number of partitions and
the default seed.
i.i.d. samples from the log normal distribution with the input
mean and standard deviation
i.i.d. samples drawn from a
log normal distribution.
Param[Long] for Java.Long values.key.
key.
CompressionCodec.CompressionCodec.RpcEndpointRef which is located in the driver via its name.
StructField of type map.
other, but keeps the
attributes from this graph.
Matrix."rmse" (default), "mse", "r2", and "mae")
Duration representing
a given number of milliseconds.this and other, minus will act as a set difference
operation returning only those unique VertexId's present in this.
this and other, minus will act as a set difference
operation returning only those unique VertexId's present in this.
Duration representing
a given number of minutes.Transformer produced by an Estimator.BlockMatrix to other, another BlockMatrix.
MultivariateStatisticalSummary to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector
format in a online fashion.DataFrameNaFunctions for working with missing data.
receiverStream.
org.apache.hadoop.mapreduce).SerializerInstance.
HadoopFsRelation, this method gets called by each task on executor side
to instantiate new OutputWriters.
RandomRDDs.normalRDD(org.apache.spark.SparkContext, long, int, long).
RandomRDDs.normalJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long) with the default seed.
RandomRDDs.normalJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long) with the default number of partitions and the default seed.
RandomRDDs.normalVectorRDD(org.apache.spark.SparkContext, long, int, int, long).
RandomRDDs.normalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long) with the default seed.
RandomRDDs.normalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long) with the default number of partitions and the default seed.
i.i.d. samples from the standard normal distribution.
i.i.d. samples drawn from the
standard normal distribution.
true iff child is evaluated to false.n inclusive) in an ordered window
partition.
NULL values.RankingMetrics instance (for Java users).
DenseMatrix consisting of ones.
DenseMatrix consisting of ones.
OneVsRest.GraphOps object.
true iff at least one of left or right evaluates to true.DataFrame sorted by the given expressions.
DataFrame sorted by the given expressions.
DataFrame sorted by the given expressions.
DataFrame sorted by the given expressions.
WindowSpec with the ordering defined.
WindowSpec with the ordering defined.
WindowSpec with the ordering defined.
WindowSpec with the ordering defined.
WindowSpec.
WindowSpec.
WindowSpec.
WindowSpec.
table RDD and merges the results using mapFunc.
OutputWriter is used together with HadoopFsRelation for persisting rows to the
underlying file system.OutputWriters.Param.isValid.DataFrame.
DataFrame.
DataFrame in Parquet format at the specified path.
read().parquet().
Vector.toString into a Vector.
partitionStrategy.
partitionStrategy.
WindowSpec with the partitioning defined.
WindowSpec with the partitioning defined.
WindowSpec with the partitioning defined.
WindowSpec with the partitioning defined.
WindowSpec.
WindowSpec.
WindowSpec.
WindowSpec.
prev) into fewer partitions, so that each partition of
this RDD computes one or more of the parent ones.partitionsRDD already has a partitioner, use it.
2 * sqrt(numParts) - 1 bound on vertex replication.gaps is true or tokens if gaps is false.
PCA that can project vectors to a low-dimensional space using PCA.Estimator or a Transformer.RandomRDDs.poissonRDD(org.apache.spark.SparkContext, double, long, int, long).
RandomRDDs.poissonJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long) with the default seed.
RandomRDDs.poissonJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long) with the default number of partitions and the default seed.
RandomRDDs.poissonVectorRDD(org.apache.spark.SparkContext, double, long, int, int, long).
RandomRDDs.poissonJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long) with the default seed.
RandomRDDs.poissonJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long) with the default number of partitions and the default seed.
i.i.d. samples from the Poisson distribution with the input
mean.
i.i.d. samples drawn from the
Poisson distribution with the input mean.
predict()
MatrixFactorizationModel.predict.
OutputWriterFactory.
Metadata.
Metadata array.
DenseMatrix consisting of i.i.d. uniform random numbers.
DenseMatrix consisting of i.i.d. uniform random numbers.
DenseMatrix consisting of i.i.d. gaussian random numbers.
DenseMatrix consisting of i.i.d. gaussian random numbers.
Vector of given length containing random numbers
between 0.0 and 1.0.
Random Forest
learning algorithm for classification and regression.Random Forest model for classification.Random Forest learning algorithm for
classification.Random Forest model for regression.Random Forest learning algorithm for regression.i.i.d. samples produced by the input RandomDataGenerator.
i.i.d. samples from some distribution.DataFrame with the provided weights.
DataFrame with the provided weights.
i.i.d. samples produced by the
input RandomDataGenerator.
start to end(exclusive), increased by
step every element.
start (inclusive) to end (inclusive).
Partitioner that partitions sortable records by range into roughly
equal ranges.DataFrame as an RDD of Rows.
InputDStream
that has to start a receiver on worker nodes to receive external data.reduceByKey to each RDD.
reduceByKey to each RDD.
reduceByKey to each RDD.
reduceByKey to each RDD.
reduceByKey to each RDD.
reduceByKey to each RDD.
reduceByKey over a sliding window on this DStream.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window on this DStream.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
reduceByKey over a sliding window.
gaps is true).DataFrame.
DataFrame as a temporary table using the given name.
DataFrame that has exactly numPartitions partitions.
replacement map with the corresponding values.
replacement map with the corresponding values.
replacement map.
replacement map.
ShuffleMapTask that completed successfully earlier, but we
lost the executor before the stage completed.VertexRDD reflecting a reversal of all edge directions in the corresponding
EdgeRDD.
this and other.
this and other.
this and other.
this and other.
this and other.
this and other.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
this DStream and
other DStream.
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrame using the specified columns,
so we can run aggregation on them.
DataFrame using the specified columns,
so we can run aggregation on them.
Row objects.start (inclusive) to end (inclusive).
http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf.
run()
data should be cached for high
performance, because this is an iterative algorithm.
run()
PowerIterationClustering.run.
org.apache.spark.mllib.tree.GradientBoostedTrees!#run.
Iterator[T] => U instead of (TaskContext, Iterator[T]) => U.
run() and returns exactly
the same result.
org.apache.spark.mllib.tree.GradientBoostedTrees!#runWithValidation.
DataFrame by sampling a fraction of rows.
DataFrame by sampling a fraction of rows, using a random seed.
write().save(path).
write().mode(mode).save(path).
write().format(source).save(path).
write().format(source).mode(mode).save(path).
write().format(source).mode(mode).options(options).save(path).
write().format(source).mode(mode).options(options).save(path).
DataFrame at the specified path.
DataFrame as the specified table.
OutputFormat class
supporting the key and value types K and V in this RDD.
OutputFormat class
supporting the key and value types K and V in this RDD.
OutputFormat class
supporting the key and value types K and V in this RDD.
OutputFormat class
supporting the key and value types K and V in this RDD.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
this DStream as a Hadoop file.
write().parquet().
write().saveAsTable(tableName).
write().mode(mode).saveAsTable(tableName).
write().format(source).saveAsTable(tableName).
write().mode(mode).saveAsTable(tableName).
write().format(source).mode(mode).options(options).saveAsTable(tableName).
write().format(source).mode(mode).options(options).saveAsTable(tableName).
DataFrame as the specified table.
RDD.saveAsTextFile(java.lang.String) for saving and
MLUtils.loadLabeledPoints(org.apache.spark.SparkContext, java.lang.String, int) for loading.
sparkContext
DataFrame.
Duration representing
a given number of seconds.setDocConcentration()
1.0).
setTopicConcentration()
LBFGS.setNumIterations(int) instead
0.3).
Short values.DataFrame in a tabular form.
DataFrame in a tabular form.
FutureAction holding the result of an action that triggers a single job.CompressionCodec.SnappyOutputStream which guards against write-after-close and double-close
issues.DataFrame sorted by the specified column, all in ascending order.
DataFrame sorted by the given expressions.
DataFrame sorted by the specified column, all in ascending order.
DataFrame sorted by the given expressions.
SparkContext.addFile().SparseMatrix format from the supplied values.
Matrix format.
SparseMatrix format.
SparseMatrix consisting of i.i.d. gaussian random numbers.
SparseMatrix consisting of i.i.d.
SparseMatrix consisting of i.i.d. gaussian random numbers.
SparseMatrix consisting of i.i.d.
DataFrames.Column.DataFrameStatFunctions for working statistic functions support.
StatCounter object that captures the mean, variance and
count of the RDD's elements in one operation.
StatCounter object that captures the mean, variance and
count of the RDD's elements in one operation.
Strategy
StructField of type string.
Param[Array[String} for Java.true iff the attribute evaluates to
a string that contains the string value.true iff the attribute evaluates to
a string that starts with value.StringIndexer.true iff the attribute evaluates to
a string that starts with value.String values.StructField of type struct.
StructField of type struct.
StructType object can be constructed bythis that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this that are not in other.
this whose keys are not in other.
this whose keys are not in other.
DataFrame.
n rows in the DataFrame.
take action, which returns a
future for retrieving the first num elements of this RDD.
StructField of type timestamp.
java.sql.Timestamp values.JavaRDDLike.collect() instead
DenseMatrix from the given SparseMatrix.
DataFrame with columns renamed.
DataFrame with columns renamed.
EdgeTriplet for convenience.
DataFrame as a JavaRDD of Rows.
DataFrame as a RDD of JSON strings.
Broadcast implementation that uses a BitTorrent-like
protocol to do a distributed transfer of the broadcasted data to the executors.toDF().
SparseMatrix from the given DenseMatrix.
StructField with some existing metadata.
StructField.
(label, features) pairs.
(label, features) pairs.
(label, features) pairs.
GradientBoostedTrees$.train(org.apache.spark.rdd.RDD, org.apache.spark.mllib.tree.configuration.BoostingStrategy)
DecisionTree$.trainClassifier(org.apache.spark.rdd.RDD, int, scala.collection.immutable.Map, java.lang.String, int, int)
RandomForest$.trainClassifier(org.apache.spark.rdd.RDD, org.apache.spark.mllib.tree.configuration.Strategy, int, java.lang.String, int)
DecisionTree$.trainRegressor(org.apache.spark.rdd.RDD, scala.collection.immutable.Map, java.lang.String, int, int)
RandomForest$.trainRegressor(org.apache.spark.rdd.RDD, org.apache.spark.mllib.tree.configuration.Strategy, int, java.lang.String, int)
featuresCol, and appending new columns as specified by
parameters:
- predicted labels as predictionCol of type Double
- raw predictions (confidences) as rawPredictionCol of type Vector.
featuresCol, calling predict(), and storing
the predictions as a new column predictionCol.
BlockMatrix.
JavaRDDLike.treeAggregate(U, org.apache.spark.api.java.function.Function2, org.apache.spark.api.java.function.Function2, int) with suggested depth 2.
RDD.treeAggregate(U, scala.Function2, scala.Function2, int, scala.reflect.ClassTag) instead.
JavaRDDLike.treeReduce(org.apache.spark.api.java.function.Function2, int) with suggested depth 2.
RDD.treeReduce(scala.Function2, int) instead.
RandomRDDs.uniformRDD(org.apache.spark.SparkContext, long, int, long).
RandomRDDs.uniformJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long) with the default seed.
RandomRDDs.uniformJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long) with the default number of partitions and the default seed.
RandomRDDs.uniformVectorRDD(org.apache.spark.SparkContext, long, int, int, long).
RandomRDDs.uniformJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long) with the default seed.
RandomRDDs.uniformJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long) with the default number of partitions and the default seed.
i.i.d. samples from the uniform distribution U(0.0, 1.0).
i.i.d. samples drawn from the
uniform distribution on U(0.0, 1.0).
DataFrame containing union of rows in this frame and another frame.
blocks) and throws an exception if
any error is found.
Vector.RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by
pre-indexing the entries for fast, efficient joins.List of values (for Java and Python).
List of values (for Java and Python).
DataFrame by adding a column.
DataFrame with a column renamed.
Metadata instance.
Map(String, Vector), i.e.Word2Vec.DataFrame out into external storage.
WriteAheadLog.DenseMatrix consisting of zeros.
Matrix consisting of zeros.
|
|||||||||
| PREV NEXT | FRAMES NO FRAMES | ||||||||