NewHadoopRDD (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.rdd
Class NewHadoopRDD<K,V>

Object
  org.apache.spark.rdd.RDD<scala.Tuple2<K,V>>
      org.apache.spark.rdd.NewHadoopRDD<K,V>

All Implemented Interfaces:: java.io.Serializable, Logging

public class NewHadoopRDD<K,V>
extends RDD<scala.Tuple2<K,V>>
implements Logging
extends RDD<scala.Tuple2<K,V>>
implements Logging

:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API (org.apache.hadoop.mapreduce).

Note: Instantiating this class directly is not recommended, please use org.apache.spark.SparkContext.newAPIHadoopRDD()

param: sc The SparkContext to associate the RDD with. param: inputFormatClass Storage format of the data to be read. param: keyClass Class of the key associated with the inputFormatClass. param: valueClass Class of the value associated with the inputFormatClass. param: conf The Hadoop configuration.

See Also:: Serialized Form

Constructor Summary
`NewHadoopRDD(SparkContext sc, Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass, Class<K> keyClass, Class<V> valueClass, org.apache.hadoop.conf.Configuration conf)`

Method Summary

InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit, TaskContext context)
:: DeveloperApi :: Implemented by subclasses to compute a given partition.

org.apache.hadoop.conf.Configuration getConf()

Partition[] getPartitions()
Implemented by subclasses to return the set of partitions in this RDD.

scala.collection.Seq<String> getPreferredLocations(Partition hsplit)
Optionally overridden by subclasses to specify placement preferences.





<U> RDD<U>

mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
                            boolean preservesPartitioning,
                            scala.reflect.ClassTag<U> evidence$1)

Maps over a partition, providing the InputSplit that was used as the base of the partition.

NewHadoopRDD<K,V> persist(StorageLevel storageLevel)
Set this RDD's storage level to persist its values across operations after the first time it is computed.

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class org.apache.spark.rdd.RDD

aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, wait, wait, wait`

Methods inherited from interface org.apache.spark.Logging
`initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning`

Constructor Detail

NewHadoopRDD

public NewHadoopRDD(SparkContext sc,
                    Class<? extends org.apache.hadoop.mapreduce.InputFormat<K,V>> inputFormatClass,
                    Class<K> keyClass,
                    Class<V> valueClass,
                    org.apache.hadoop.conf.Configuration conf)

Method Detail

getPartitions

public Partition[] getPartitions()

Description copied from class: RDD

Implemented by subclasses to return the set of partitions in this RDD. This method will only be called once, so it is safe to implement a time-consuming computation in it.

Returns:: (undocumented)

compute

public InterruptibleIterator<scala.Tuple2<K,V>> compute(Partition theSplit,
                                                        TaskContext context)

Description copied from class: RDD

:: DeveloperApi :: Implemented by subclasses to compute a given partition.

Specified by:: compute in class RDD<scala.Tuple2<K,V>>

Parameters:: theSplit - (undocumented); context - (undocumented)
Returns:: (undocumented)

mapPartitionsWithInputSplit

public <U> RDD<U> mapPartitionsWithInputSplit(scala.Function2<org.apache.hadoop.mapreduce.InputSplit,scala.collection.Iterator<scala.Tuple2<K,V>>,scala.collection.Iterator<U>> f,
                                              boolean preservesPartitioning,
                                              scala.reflect.ClassTag<U> evidence$1)

Maps over a partition, providing the InputSplit that was used as the base of the partition.

getPreferredLocations

public scala.collection.Seq<String> getPreferredLocations(Partition hsplit)

Description copied from class: RDD

Optionally overridden by subclasses to specify placement preferences.

Parameters:: hsplit - (undocumented)
Returns:: (undocumented)

persist

public NewHadoopRDD<K,V> persist(StorageLevel storageLevel)

Description copied from class: RDD

Set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet..

Overrides:: persist in class RDD<scala.Tuple2<K,V>>

Parameters:: storageLevel - (undocumented)
Returns:: (undocumented)

getConf

public org.apache.hadoop.conf.Configuration getConf()

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.rdd Class NewHadoopRDD<K,V>

NewHadoopRDD

getPartitions

compute

mapPartitionsWithInputSplit

getPreferredLocations

persist

getConf

org.apache.spark.rdd
Class NewHadoopRDD<K,V>