public class GroupedData
extends java.lang.Object
DataFrame, created by DataFrame.groupBy.
 | Modifier | Constructor and Description | 
|---|---|
| protected  | GroupedData(DataFrame df,
           scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs,
           org.apache.spark.sql.GroupedData.GroupType groupType) | 
| Modifier and Type | Method and Description | 
|---|---|
| DataFrame | agg(Column expr,
   Column... exprs)Compute aggregates by specifying a series of aggregate columns. | 
| DataFrame | agg(Column expr,
   scala.collection.Seq<Column> exprs)Compute aggregates by specifying a series of aggregate columns. | 
| DataFrame | agg(scala.collection.immutable.Map<java.lang.String,java.lang.String> exprs)(Scala-specific) Compute aggregates by specifying a map from column name to
 aggregate methods. | 
| DataFrame | agg(java.util.Map<java.lang.String,java.lang.String> exprs)(Java-specific) Compute aggregates by specifying a map from column name to
 aggregate methods. | 
| DataFrame | agg(scala.Tuple2<java.lang.String,java.lang.String> aggExpr,
   scala.collection.Seq<scala.Tuple2<java.lang.String,java.lang.String>> aggExprs)(Scala-specific) Compute aggregates by specifying a map from column name to
 aggregate methods. | 
| static GroupedData | apply(DataFrame df,
     scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs,
     org.apache.spark.sql.GroupedData.GroupType groupType) | 
| DataFrame | avg(scala.collection.Seq<java.lang.String> colNames)Compute the mean value for each numeric columns for each group. | 
| DataFrame | avg(java.lang.String... colNames)Compute the mean value for each numeric columns for each group. | 
| DataFrame | count()Count the number of rows for each group. | 
| DataFrame | max(scala.collection.Seq<java.lang.String> colNames)Compute the max value for each numeric columns for each group. | 
| DataFrame | max(java.lang.String... colNames)Compute the max value for each numeric columns for each group. | 
| DataFrame | mean(scala.collection.Seq<java.lang.String> colNames)Compute the average value for each numeric columns for each group. | 
| DataFrame | mean(java.lang.String... colNames)Compute the average value for each numeric columns for each group. | 
| DataFrame | min(scala.collection.Seq<java.lang.String> colNames)Compute the min value for each numeric column for each group. | 
| DataFrame | min(java.lang.String... colNames)Compute the min value for each numeric column for each group. | 
| DataFrame | sum(scala.collection.Seq<java.lang.String> colNames)Compute the sum for each numeric columns for each group. | 
| DataFrame | sum(java.lang.String... colNames)Compute the sum for each numeric columns for each group. | 
protected GroupedData(DataFrame df, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs, org.apache.spark.sql.GroupedData.GroupType groupType)
public static GroupedData apply(DataFrame df, scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Expression> groupingExprs, org.apache.spark.sql.GroupedData.GroupType groupType)
public DataFrame agg(Column expr, Column... exprs)
spark.sql.retainGroupColumns to false.
 
 The available aggregate methods are defined in functions.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   // Scala:
   import org.apache.spark.sql.functions._
   df.groupBy("department").agg(max("age"), sum("expense"))
   // Java:
   import static org.apache.spark.sql.functions.*;
   df.groupBy("department").agg(max("age"), sum("expense"));
 
 Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change
 to that behavior, set config variable spark.sql.retainGroupColumns to false.
 
   // Scala, 1.3.x:
   df.groupBy("department").agg($"department", max("age"), sum("expense"))
   // Java, 1.3.x:
   df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
 expr - (undocumented)exprs - (undocumented)public DataFrame mean(java.lang.String... colNames)
avg.
 The resulting DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the average values for them.
 colNames - (undocumented)public DataFrame max(java.lang.String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the max values for them.
 colNames - (undocumented)public DataFrame avg(java.lang.String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the mean values for them.
 colNames - (undocumented)public DataFrame min(java.lang.String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the min values for them.
 colNames - (undocumented)public DataFrame sum(java.lang.String... colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the sum for them.
 colNames - (undocumented)public DataFrame agg(scala.Tuple2<java.lang.String,java.lang.String> aggExpr, scala.collection.Seq<scala.Tuple2<java.lang.String,java.lang.String>> aggExprs)
DataFrame will also contain the grouping columns.
 
 The available aggregate methods are avg, max, min, sum, count.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   df.groupBy("department").agg(
     "age" -> "max",
     "expense" -> "sum"
   )
 aggExpr - (undocumented)aggExprs - (undocumented)public DataFrame agg(scala.collection.immutable.Map<java.lang.String,java.lang.String> exprs)
DataFrame will also contain the grouping columns.
 
 The available aggregate methods are avg, max, min, sum, count.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   df.groupBy("department").agg(Map(
     "age" -> "max",
     "expense" -> "sum"
   ))
 exprs - (undocumented)public DataFrame agg(java.util.Map<java.lang.String,java.lang.String> exprs)
DataFrame will also contain the grouping columns.
 
 The available aggregate methods are avg, max, min, sum, count.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   import com.google.common.collect.ImmutableMap;
   df.groupBy("department").agg(ImmutableMap.of("age", "max", "expense", "sum"));
 exprs - (undocumented)public DataFrame agg(Column expr, scala.collection.Seq<Column> exprs)
spark.sql.retainGroupColumns to false.
 
 The available aggregate methods are defined in functions.
 
   // Selects the age of the oldest employee and the aggregate expense for each department
   // Scala:
   import org.apache.spark.sql.functions._
   df.groupBy("department").agg(max("age"), sum("expense"))
   // Java:
   import static org.apache.spark.sql.functions.*;
   df.groupBy("department").agg(max("age"), sum("expense"));
 
 Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change
 to that behavior, set config variable spark.sql.retainGroupColumns to false.
 
   // Scala, 1.3.x:
   df.groupBy("department").agg($"department", max("age"), sum("expense"))
   // Java, 1.3.x:
   df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
 expr - (undocumented)exprs - (undocumented)public DataFrame count()
DataFrame will also contain the grouping columns.
 public DataFrame mean(scala.collection.Seq<java.lang.String> colNames)
avg.
 The resulting DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the average values for them.
 colNames - (undocumented)public DataFrame max(scala.collection.Seq<java.lang.String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the max values for them.
 colNames - (undocumented)public DataFrame avg(scala.collection.Seq<java.lang.String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the mean values for them.
 colNames - (undocumented)public DataFrame min(scala.collection.Seq<java.lang.String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the min values for them.
 colNames - (undocumented)public DataFrame sum(scala.collection.Seq<java.lang.String> colNames)
DataFrame will also contain the grouping columns.
 When specified columns are given, only compute the sum for them.
 colNames - (undocumented)