public class NaiveBayes extends java.lang.Object implements scala.Serializable, Logging
(label, features) pairs.
This is the Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all kinds of
discrete data. For example, by converting documents into TF-IDF vectors, it can be used for
document classification. By making every vector a 0-1 vector, it can also be used as
Bernoulli NB (http://tinyurl.com/p7c96j6). The input feature values must be nonnegative.
| Constructor and Description |
|---|
NaiveBayes() |
NaiveBayes(double lambda) |
| Modifier and Type | Method and Description |
|---|---|
static java.lang.String |
Bernoulli()
String name for Bernoulli model type.
|
double |
getLambda()
Get the smoothing parameter.
|
java.lang.String |
getModelType()
Get the model type.
|
static java.lang.String |
Multinomial()
String name for multinomial model type.
|
NaiveBayesModel |
run(RDD<LabeledPoint> data)
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
|
NaiveBayes |
setLambda(double lambda)
Set the smoothing parameter.
|
NaiveBayes |
setModelType(java.lang.String modelType)
Set the model type using a string (case-sensitive).
|
static scala.collection.immutable.Set<java.lang.String> |
supportedModelTypes() |
static NaiveBayesModel |
train(RDD<LabeledPoint> input)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
static NaiveBayesModel |
train(RDD<LabeledPoint> input,
double lambda)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
static NaiveBayesModel |
train(RDD<LabeledPoint> input,
double lambda,
java.lang.String modelType)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinitializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningpublic static java.lang.String Multinomial()
public static java.lang.String Bernoulli()
public static scala.collection.immutable.Set<java.lang.String> supportedModelTypes()
public static NaiveBayesModel train(RDD<LabeledPoint> input)
(label, features) pairs.
This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all
kinds of discrete data. For example, by converting documents into TF-IDF vectors, it
can be used for document classification.
This version of the method uses a default smoothing parameter of 1.0.
input - RDD of (label, array of features) pairs. Every vector should be a frequency
vector or a count vector.public static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda)
(label, features) pairs.
This is the default Multinomial NB (http://tinyurl.com/lsdw6p) which can handle all
kinds of discrete data. For example, by converting documents into TF-IDF vectors, it
can be used for document classification.
input - RDD of (label, array of features) pairs. Every vector should be a frequency
vector or a count vector.lambda - The smoothing parameterpublic static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda, java.lang.String modelType)
(label, features) pairs.
The model type can be set to either Multinomial NB (http://tinyurl.com/lsdw6p)
or Bernoulli NB (http://tinyurl.com/p7c96j6). The Multinomial NB can handle
discrete count data and can be called by setting the model type to "multinomial".
For example, it can be used with word counts or TF_IDF vectors of documents.
The Bernoulli model fits presence or absence (0-1) counts. By making every vector a
0-1 vector and setting the model type to "bernoulli", the fits and predicts as
Bernoulli NB.
input - RDD of (label, array of features) pairs. Every vector should be a frequency
vector or a count vector.lambda - The smoothing parameter
modelType - The type of NB model to fit from the enumeration NaiveBayesModels, can be
multinomial or bernoullipublic NaiveBayes setLambda(double lambda)
public double getLambda()
public NaiveBayes setModelType(java.lang.String modelType)
modelType - (undocumented)public java.lang.String getModelType()
public NaiveBayesModel run(RDD<LabeledPoint> data)
data - RDD of LabeledPoint.