public final class RandomForestClassificationModel extends ProbabilisticClassificationModel<Vector,RandomForestClassificationModel> implements scala.Serializable
Random Forest model for classification.
It supports both binary and multiclass labels, as well as both continuous and categorical
features.
param: _trees Decision trees in the ensemble.
Warning: These have null parents.
param: numFeatures Number of features used by this model| Modifier and Type | Method and Description |
|---|---|
RandomForestClassificationModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Vector |
featureImportances()
Estimate of the importance of each feature.
|
static RandomForestClassificationModel |
fromOld(RandomForestModel oldModel,
RandomForestClassifier parent,
scala.collection.immutable.Map<java.lang.Object,java.lang.Object> categoricalFeatures,
int numClasses)
(private[ml]) Convert a model from the old API
|
int |
numClasses()
Number of classes (values which the label can take).
|
int |
numFeatures() |
protected Vector |
predictRaw(Vector features)
Raw prediction for each possible label.
|
protected Vector |
raw2probabilityInPlace(Vector rawPrediction)
Estimate the probability of each class given the raw prediction,
doing the computation in-place.
|
java.lang.String |
toString() |
protected DataFrame |
transformImpl(DataFrame dataset) |
org.apache.spark.ml.tree.DecisionTreeModel[] |
trees() |
double[] |
treeWeights() |
java.lang.String |
uid()
An immutable unique ID for the object and its derivatives.
|
StructType |
validateAndTransformSchema(StructType schema,
boolean fitting,
DataType featuresDataType) |
StructType |
validateAndTransformSchema(StructType schema,
boolean fitting,
DataType featuresDataType)
Validates and transforms the input schema with the provided param map.
|
normalizeToProbabilitiesInPlace, predictProbability, probability2prediction, raw2prediction, raw2probability, setProbabilityCol, setThresholds, transformpredict, setRawPredictionColfeaturesDataType, setFeaturesCol, setPredictionCol, transformSchematransform, transform, transformtransformSchemaclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParamsinitializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningpublic static RandomForestClassificationModel fromOld(RandomForestModel oldModel, RandomForestClassifier parent, scala.collection.immutable.Map<java.lang.Object,java.lang.Object> categoricalFeatures, int numClasses)
public java.lang.String uid()
Identifiableuid in interface Identifiablepublic int numFeatures()
public int numClasses()
ClassificationModelnumClasses in class ClassificationModel<Vector,RandomForestClassificationModel>public org.apache.spark.ml.tree.DecisionTreeModel[] trees()
public double[] treeWeights()
protected DataFrame transformImpl(DataFrame dataset)
transformImpl in class PredictionModel<Vector,RandomForestClassificationModel>protected Vector predictRaw(Vector features)
ClassificationModeltransform() and output rawPredictionCol.
predictRaw in class ClassificationModel<Vector,RandomForestClassificationModel>features - (undocumented)protected Vector raw2probabilityInPlace(Vector rawPrediction)
ProbabilisticClassificationModel
This internal method is used to implement transform() and output probabilityCol.
raw2probabilityInPlace in class ProbabilisticClassificationModel<Vector,RandomForestClassificationModel>rawPrediction - (undocumented)public RandomForestClassificationModel copy(ParamMap extra)
Paramscopy in interface Paramscopy in class Model<RandomForestClassificationModel>extra - (undocumented)defaultCopy()public java.lang.String toString()
toString in interface IdentifiabletoString in class java.lang.Objectpublic Vector featureImportances()
This generalizes the idea of "Gini" importance to other losses, following the explanation of Gini importance from "Random Forests" documentation by Leo Breiman and Adele Cutler, and following the implementation from scikit-learn.
This feature importance is calculated as follows: - Average over trees: - importance(feature j) = sum (over nodes which split on feature j) of the gain, where gain is scaled by the number of instances passing through node - Normalize importances for tree based on total number of training instances used to build tree. - Normalize feature importance vector to sum to 1.
public StructType validateAndTransformSchema(StructType schema, boolean fitting, DataType featuresDataType)
public StructType validateAndTransformSchema(StructType schema, boolean fitting, DataType featuresDataType)
schema - input schemafitting - whether this is in fittingfeaturesDataType - SQL DataType for FeaturesType.
E.g., VectorUDT for vector features.