Class SparkDataFrameConversions
- java.lang.Object
-
- cern.nxcals.api.backport.pytimber.utils.SparkDataFrameConversions
-
public class SparkDataFrameConversions extends java.lang.Object
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.SortedMap<java.lang.String,java.lang.Object>
extractAllColumns(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset)
Converts a Spark Dataset to arrays of primitive data or arrays for efficient consumption with numpy.static java.lang.Object[]
extractArrayColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
static boolean[]
extractBooleanColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
static java.lang.Object[]
extractColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
static double[]
extractDoubleColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
static long[]
extractLongColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
static java.lang.String[]
extractStringColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
-
-
Method Detail
-
extractColumn
public static java.lang.Object[] extractColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
extractStringColumn
public static java.lang.String[] extractStringColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
extractDoubleColumn
public static double[] extractDoubleColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
extractLongColumn
public static long[] extractLongColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
extractBooleanColumn
public static boolean[] extractBooleanColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
extractArrayColumn
public static java.lang.Object[] extractArrayColumn(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)
-
extractAllColumns
public static java.util.SortedMap<java.lang.String,java.lang.Object> extractAllColumns(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset)
Converts a Spark Dataset to arrays of primitive data or arrays for efficient consumption with numpy.The Spark Dataset can have one or several columns, and the columns can have names with special chars as they appear in (NX)CALS Variables, e.g. "SPS.BWS.51995.H_ROT.APP.IN:BU_INTENS_AV".
A column with Scalar data is convertd to an array of primitive values, a column with VectorNumeric data is converted to an array of primitive arrays.
- Parameters:
dataset
- a dataset with one or several columns, where the columns names can contain special chars- Returns:
- a SortedMap<String, Object> with key=ColumnName and value=Primitive Array
-
-