Interface AggregationService

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Entity entity, DatasetAggregationProperties properties)
      Returns a dataset where each row contains an aligned entity data point on timestamp, extracted from a provided driving dataset.
      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Entity entity, WindowAggregationProperties properties)
      Returns a dataset where each row contains an aggregated entity data point on fixed, distinct timestamp.
      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Variable variable, DatasetAggregationProperties properties)
      Returns a dataset where each row contains an aligned variable data point on timestamp, extracted from a provided driving dataset.
      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Variable variable, WindowAggregationProperties properties)
      Returns a dataset where each row contains an aggregated variable data point on fixed, distinct timestamp.
    • Method Detail

      • getData

        org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Variable variable,
                                                                       WindowAggregationProperties properties)
        Returns a dataset where each row contains an aggregated variable data point on fixed, distinct timestamp. The timestamp (in utc nanos) represents the start of an interval, part of a generated sequence of time windows. Each data point contains the result of an applied aggregation function, as specified by the provided properties. The aggregation function is applied on values present within the right-open intervals [startInclusive, endExclusive)
        Parameters:
        variable - an instance of Variable pointing to raw data for the extraction
        properties - the aggregation properties containing all semantics to control the action
        Returns:
        a dataset of aggregated rows
      • getData

        org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Entity entity,
                                                                       WindowAggregationProperties properties)
        Returns a dataset where each row contains an aggregated entity data point on fixed, distinct timestamp. The timestamp (in utc nanos) represents the start of an interval, part of a generated sequence of time windows. Each data point contains the result of an applied aggregation function, as specified by the provided properties. The aggregation function is applied on values present within the right-open intervals [startInclusive, endExclusive)
        Parameters:
        entity - an instance of Entity pointing to raw data for the extraction
        properties - the aggregation properties containing all semantics to control the action
        Returns:
        a dataset of aggregated rows
      • getData

        org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Variable variable,
                                                                       DatasetAggregationProperties properties)
        Returns a dataset where each row contains an aligned variable data point on timestamp, extracted from a provided driving dataset. If the driving timestamp matches an actual data row timestamp, that matching value will be assigned to the aligned entry. Otherwise, the process will try to identify the closest previous actual value and assign this one instead (repeat value).
        Parameters:
        variable - an instance of Variable pointing to raw data for the extraction
        properties - the aggregation properties containing all semantics to control the action
        Returns:
        a dataset containing the aligned rows
      • getData

        org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getData​(Entity entity,
                                                                       DatasetAggregationProperties properties)
        Returns a dataset where each row contains an aligned entity data point on timestamp, extracted from a provided driving dataset. If the driving timestamp matches an actual data row timestamp, that matching value will be assigned to the aligned entry. Otherwise, the process will try to identify the closest previous actual value and assign this one instead (repeat value).
        Parameters:
        entity - an instance of Entity pointing to raw data for the extraction
        properties - the aggregation properties containing all semantics to control the action
        Returns:
        a dataset containing the aligned rows