public class RandomForestClassifier extends ProbabilisticClassifier<Vector,RandomForestClassifier,RandomForestClassificationModel> implements RandomForestClassifierParams, DefaultParamsWritable
| Constructor and Description |
|---|
RandomForestClassifier() |
RandomForestClassifier(String uid) |
| Modifier and Type | Method and Description |
|---|---|
BooleanParam |
bootstrap()
Whether bootstrap samples are used when building trees.
|
BooleanParam |
cacheNodeIds()
If false, the algorithm will pass trees to executors to match instances with nodes.
|
IntParam |
checkpointInterval()
Param for set checkpoint interval (>= 1) or disable checkpoint (-1).
|
RandomForestClassifier |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Param<String> |
featureSubsetStrategy()
The number of features to consider for splits at each tree node.
|
Param<String> |
impurity()
Criterion used for information gain calculation (case-insensitive).
|
Param<String> |
leafCol()
Leaf indices column name.
|
static RandomForestClassifier |
load(String path) |
IntParam |
maxBins()
Maximum number of bins used for discretizing continuous features and for choosing how to split
on features at each node.
|
IntParam |
maxDepth()
Maximum depth of the tree (nonnegative).
|
IntParam |
maxMemoryInMB()
Maximum memory in MB allocated to histogram aggregation.
|
DoubleParam |
minInfoGain()
Minimum information gain for a split to be considered at a tree node.
|
IntParam |
minInstancesPerNode()
Minimum number of instances each child must have after split.
|
DoubleParam |
minWeightFractionPerNode()
Minimum fraction of the weighted sample count that each child must have after split.
|
IntParam |
numTrees()
Number of trees to train (at least 1).
|
static MLReader<T> |
read() |
LongParam |
seed()
Param for random seed.
|
RandomForestClassifier |
setBootstrap(boolean value) |
RandomForestClassifier |
setCacheNodeIds(boolean value) |
RandomForestClassifier |
setCheckpointInterval(int value)
Specifies how often to checkpoint the cached node IDs.
|
RandomForestClassifier |
setFeatureSubsetStrategy(String value) |
RandomForestClassifier |
setImpurity(String value) |
RandomForestClassifier |
setMaxBins(int value) |
RandomForestClassifier |
setMaxDepth(int value) |
RandomForestClassifier |
setMaxMemoryInMB(int value) |
RandomForestClassifier |
setMinInfoGain(double value) |
RandomForestClassifier |
setMinInstancesPerNode(int value) |
RandomForestClassifier |
setMinWeightFractionPerNode(double value) |
RandomForestClassifier |
setNumTrees(int value) |
RandomForestClassifier |
setSeed(long value) |
RandomForestClassifier |
setSubsamplingRate(double value) |
RandomForestClassifier |
setWeightCol(String value)
Sets the value of param
weightCol. |
DoubleParam |
subsamplingRate()
Fraction of the training data used for learning each decision tree, in range (0, 1].
|
static String[] |
supportedFeatureSubsetStrategies()
Accessor for supported featureSubsetStrategy settings: auto, all, onethird, sqrt, log2
|
static String[] |
supportedImpurities()
Accessor for supported impurity settings: entropy, gini
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
Param<String> |
weightCol()
Param for weight column name.
|
probabilityCol, setProbabilityCol, setThresholds, thresholdsrawPredictionCol, setRawPredictionColfeaturesCol, fit, labelCol, predictionCol, setFeaturesCol, setLabelCol, setPredictionCol, transformSchemaparamsequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetBootstrap, getNumTreesvalidateAndTransformSchemagetFeatureSubsetStrategy, getOldStrategy, getSubsamplingRategetCacheNodeIds, getLeafCol, getMaxBins, getMaxDepth, getMaxMemoryInMB, getMinInfoGain, getMinInstancesPerNode, getMinWeightFractionPerNode, getOldStrategy, setLeafColgetCheckpointIntervalgetWeightColextractInstancesextractInstances, extractInstancesgetLabelCol, labelColfeaturesCol, getFeaturesColgetPredictionCol, predictionColclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwntoStringgetRawPredictionCol, rawPredictionColgetProbabilityCol, probabilityColgetThresholds, thresholdsgetImpurity, getOldImpuritywritesave$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic RandomForestClassifier(String uid)
public RandomForestClassifier()
public static final String[] supportedImpurities()
public static final String[] supportedFeatureSubsetStrategies()
public static RandomForestClassifier load(String path)
public static MLReader<T> read()
public final Param<String> impurity()
TreeClassifierParamsimpurity in interface TreeClassifierParamspublic final IntParam numTrees()
RandomForestParams
Note: The reason that we cannot add this to both GBT and RF (i.e. in TreeEnsembleParams)
is the param maxIter controls how many trees a GBT has. The semantics in the algorithms
are a bit different.
numTrees in interface RandomForestParamspublic final BooleanParam bootstrap()
RandomForestParamsbootstrap in interface RandomForestParamspublic final DoubleParam subsamplingRate()
TreeEnsembleParamssubsamplingRate in interface TreeEnsembleParamspublic final Param<String> featureSubsetStrategy()
TreeEnsembleParamsThese various settings are based on the following references: - log2: tested in Breiman (2001) - sqrt: recommended by Breiman manual for random forests - The defaults of sqrt (classification) and onethird (regression) match the R randomForest package.
featureSubsetStrategy in interface TreeEnsembleParamspublic final Param<String> leafCol()
DecisionTreeParamsleafCol in interface DecisionTreeParamspublic final IntParam maxDepth()
DecisionTreeParamsmaxDepth in interface DecisionTreeParamspublic final IntParam maxBins()
DecisionTreeParamsmaxBins in interface DecisionTreeParamspublic final IntParam minInstancesPerNode()
DecisionTreeParamsminInstancesPerNode in interface DecisionTreeParamspublic final DoubleParam minWeightFractionPerNode()
DecisionTreeParamsminWeightFractionPerNode in interface DecisionTreeParamspublic final DoubleParam minInfoGain()
DecisionTreeParamsminInfoGain in interface DecisionTreeParamspublic final IntParam maxMemoryInMB()
DecisionTreeParamsmaxMemoryInMB in interface DecisionTreeParamspublic final BooleanParam cacheNodeIds()
DecisionTreeParamscacheNodeIds in interface DecisionTreeParamspublic final Param<String> weightCol()
HasWeightColweightCol in interface HasWeightColpublic final LongParam seed()
HasSeedpublic final IntParam checkpointInterval()
HasCheckpointIntervalcheckpointInterval in interface HasCheckpointIntervalpublic String uid()
Identifiableuid in interface Identifiablepublic RandomForestClassifier setMaxDepth(int value)
public RandomForestClassifier setMaxBins(int value)
public RandomForestClassifier setMinInstancesPerNode(int value)
public RandomForestClassifier setMinWeightFractionPerNode(double value)
public RandomForestClassifier setMinInfoGain(double value)
public RandomForestClassifier setMaxMemoryInMB(int value)
public RandomForestClassifier setCacheNodeIds(boolean value)
public RandomForestClassifier setCheckpointInterval(int value)
SparkContext.
Must be at least 1.
(default = 10)value - (undocumented)public RandomForestClassifier setImpurity(String value)
public RandomForestClassifier setSubsamplingRate(double value)
public RandomForestClassifier setSeed(long value)
public RandomForestClassifier setNumTrees(int value)
public RandomForestClassifier setBootstrap(boolean value)
public RandomForestClassifier setFeatureSubsetStrategy(String value)
public RandomForestClassifier setWeightCol(String value)
weightCol.
If this is not set or empty, we treat all instance weights as 1.0.
By default the weightCol is not set, so all instances have weight 1.0.
value - (undocumented)public RandomForestClassifier copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class Predictor<Vector,RandomForestClassifier,RandomForestClassificationModel>extra - (undocumented)