Class SparkDataFrameConversions


  • public class SparkDataFrameConversions
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.SortedMap<java.lang.String,​java.lang.Object> extractAllColumns​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset)
      Converts a Spark Dataset to arrays of primitive data or arrays for efficient consumption with numpy.
      static java.lang.Object[] extractArrayColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)  
      static boolean[] extractBooleanColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)  
      static java.lang.Object[] extractColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)  
      static double[] extractDoubleColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)  
      static long[] extractLongColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)  
      static java.lang.String[] extractStringColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset, java.lang.String columnName)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • extractColumn

        public static java.lang.Object[] extractColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset,
                                                       java.lang.String columnName)
      • extractStringColumn

        public static java.lang.String[] extractStringColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset,
                                                             java.lang.String columnName)
      • extractDoubleColumn

        public static double[] extractDoubleColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset,
                                                   java.lang.String columnName)
      • extractLongColumn

        public static long[] extractLongColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset,
                                               java.lang.String columnName)
      • extractBooleanColumn

        public static boolean[] extractBooleanColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset,
                                                     java.lang.String columnName)
      • extractArrayColumn

        public static java.lang.Object[] extractArrayColumn​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset,
                                                            java.lang.String columnName)
      • extractAllColumns

        public static java.util.SortedMap<java.lang.String,​java.lang.Object> extractAllColumns​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataset)
        Converts a Spark Dataset to arrays of primitive data or arrays for efficient consumption with numpy.

        The Spark Dataset can have one or several columns, and the columns can have names with special chars as they appear in (NX)CALS Variables, e.g. "SPS.BWS.51995.H_ROT.APP.IN:BU_INTENS_AV".

        A column with Scalar data is convertd to an array of primitive values, a column with VectorNumeric data is converted to an array of primitive arrays.

        Parameters:
        dataset - a dataset with one or several columns, where the columns names can contain special chars
        Returns:
        a SortedMap<String, Object> with key=ColumnName and value=Primitive Array