public final class PrefixSpan extends Object implements Params
findFrequentSequentialPatterns method to
run the PrefixSpan algorithm.
| Constructor and Description |
|---|
PrefixSpan() |
PrefixSpan(String uid) |
| Modifier and Type | Method and Description |
|---|---|
PrefixSpan |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Dataset<Row> |
findFrequentSequentialPatterns(Dataset<?> dataset)
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
|
long |
getMaxLocalProjDBSize() |
int |
getMaxPatternLength() |
double |
getMinSupport() |
String |
getSequenceCol() |
LongParam |
maxLocalProjDBSize()
Param for the maximum number of items (including delimiters used in the internal storage
format) allowed in a projected database before local processing (default:
32000000). |
IntParam |
maxPatternLength()
Param for the maximal pattern length (default:
10). |
DoubleParam |
minSupport()
Param for the minimal support level (default:
0.1). |
Param<?>[] |
params()
Returns all params sorted by their names.
|
Param<String> |
sequenceCol()
Param for the name of the sequence column in dataset (default "sequence"), rows with
nulls in this column are ignored.
|
PrefixSpan |
setMaxLocalProjDBSize(long value) |
PrefixSpan |
setMaxPatternLength(int value) |
PrefixSpan |
setMinSupport(double value) |
PrefixSpan |
setSequenceCol(String value) |
String |
uid()
An immutable unique ID for the object and its derivatives.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitclear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, set, set, set, setDefault, setDefault, shouldOwntoStringpublic PrefixSpan copy(ParamMap extra)
ParamsdefaultCopy().public Dataset<Row> findFrequentSequentialPatterns(Dataset<?> dataset)
dataset - A dataset or a dataframe containing a sequence column which is
ArrayType(ArrayType(T)) type, T is the item type for the input dataset.
@return A `DataFrame` that contains columns of sequence and corresponding frequency.
The schema of it will be:
- `sequence: ArrayType(ArrayType(T))` (T is the item type)
- `freq: Long`public long getMaxLocalProjDBSize()
public int getMaxPatternLength()
public double getMinSupport()
public String getSequenceCol()
public LongParam maxLocalProjDBSize()
32000000).
If a projected database exceeds this size, another iteration of distributed prefix growth
is run.public IntParam maxPatternLength()
10).public DoubleParam minSupport()
0.1).
Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are
identified as frequent sequential patterns.public Param<?>[] params()
ParamsParam.
public Param<String> sequenceCol()
public PrefixSpan setMaxLocalProjDBSize(long value)
public PrefixSpan setMaxPatternLength(int value)
public PrefixSpan setMinSupport(double value)
public PrefixSpan setSequenceCol(String value)
public String uid()
Identifiableuid in interface Identifiable