Building generic NXCALS queries

NXCALS is capable of handling systems that do not comply to the CMW model based on device/property. For use cases other than CMW, a more generic, dictionary-like approach should be favored in order to construct queries for fetching data. NXCALS Data Extraction API provides such functionality via its DataQuery builder.

Basic byEntities() queries

In this case, each key-value pair, targets a specific entity key (as described by the entity/partition/schema rules that define a system).

By those rules, a CMW specific query, can be translated to DataQuery as follows:

Python

cmwData = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-06-15 00:00:00.000').endTime('2018-06-17 00:00:00.000') \
    .entity() \
    .keyValue('device', 'CPS.TGM') \
    .keyValue('property', 'FULL-TELEGRAM.STRC').build()

cmwDataPoint = DataQuery.builder(spark).byEntities() \
    .system('CMW').atTime('2018-06-16 11:34:42.000') \
    .entity() \
    .keyValue('device', 'CPS.TGM') \
    .keyValue('property', 'FULL-TELEGRAM.STRC').build()

Java

Dataset<Row> cmwData = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000")
        .entity()
        .keyValue("device", "CPS.TGM")
        .keyValue("property", "FULL-TELEGRAM.STRC").build();

Dataset<Row> cmwDataPoint = DataQuery.builder(spark).byEntities()
        .system("CMW").atTime("2018-06-16 11:34:42.000")
        .entity()
        .keyValue("device", "CPS.TGM")
        .keyValue("property", "FULL-TELEGRAM.STRC").build();

Scala

val cmwData = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000").
    entity().
    keyValue("device", "CPS.TGM").
    keyValue("property", "FULL-TELEGRAM.STRC").build()

val cmwDataPoint = DataQuery.builder(spark).byEntities().
    system("CMW").atTime("2018-06-16 11:34:42.000").
    entity().
    keyValue("device", "CPS.TGM").
    keyValue("property", "FULL-TELEGRAM.STRC").build()

Note

The same generic builder can be used for WINCCOA specific query where instead of device, property keys we use a variable_name:

Python

winccoaData = DataQuery.builder(spark).byEntities().system('WINCCOA') \
.startTime('2018-06-15 00:00:00.000').endTime('2018-06-17 00:00:00.000') \
.entity().keyValue('variable_name', 'MB.C16L2:U_HDS_3').build()

Java

Dataset<Row> winccoaData = DataQuery.builder(spark).byEntities().system("WINCCOA")
    .startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000")
    .entity().keyValue("variable_name", "MB.C16L2:U_HDS_3").build();

Scala

val winccoaData = DataQuery.builder(spark).byEntities().system("WINCCOA").
    startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000").
    entity().keyValue("variable_name", "MB.C16L2:U_HDS_3").build()

Basic byVariables() queries

So far, we have seen how to construct queries (even generic ones) for targeting entities specific to a given system. Those queries are dedicated for cases when a user already knows entity keys (system specific fields) of a target entity.

Alternatively, we can use a functionality which originates from CALS service: variables. Conceptually variables should be considered as pointers to entities and can be used for querying of the data produced by different devices, having different schemas (collection of fields).

Note

Existing CALS variables will be migrated to NXCALS and it will be possible to query them directly via the new system API.

Python

# source the nxcals query builders
from nxcals.api.extraction.data.builders import *

data = DataQuery.builder(spark).byVariables().system('CMW') \
    .startTime('2018-06-15 23:00:00.000').endTime('2018-06-16 00:00:00.000') \
    .variable('CPS.TGM:CYCLE').build()

dataPoint = DataQuery.builder(spark).byVariables().system('CMW') \
    .atTime('2018-06-15 21:01:01.400') \
    .variable('CPS.TGM:CYCLE').build()

Java

// source the nxcals query builders
import cern.nxcals.api.extraction.data.builders.*

Dataset<Row> data = DataQuery.builder(spark).byVariables().system("CMW")
        .startTime("2018-06-15 23:00:00.000").endTime("2018-06-16 00:00:00.000")
        .variable("CPS.TGM:CYCLE").build();

Dataset<Row> dataPoint = DataQuery.builder(spark).byVariables().system("CMW")
        .atTime("2018-06-15 21:01:01.400").variable("CPS.TGM:CYCLE").build();

Scala

// source the nxcals query builders
import cern.nxcals.api.extraction.data.builders._

val data = DataQuery.builder(spark).byVariables().system("CMW").
    startTime("2018-06-15 23:00:00.000").endTime("2018-06-16 00:00:00.000").
    variable("CPS.TGM:CYCLE").build()

val dataPoint = DataQuery.builder(spark).byVariables().system("CMW").
    atTime("2018-06-15 21:01:01.400").variable("CPS.TGM:CYCLE").build()

It produces dataset that has specific schema (it includes variable name):

Python

data.printSchema()

Java

data.printSchema();

Scala

data.printSchema()

root
 |-- nxcals_entity_id: long (nullable = true)
 |-- nxcals_timestamp: long (nullable = true)
 |-- nxcals_value: integer (nullable = true)
 |-- nxcals_variable_name: string (nullable = false)

as can been in the query output:

Python

data.show(10)

Java

data.show(10);

Scala

data.show(10)

+----------------+-------------------+------------+--------------------+
|nxcals_entity_id|   nxcals_timestamp|nxcals_value|nxcals_variable_name|
+----------------+-------------------+------------+--------------------+
|           46955|1529103628300000000|           8|       CPS.TGM:CYCLE|
|           46955|1529103809500000000|           5|       CPS.TGM:CYCLE|
|           46955|1529104249900000000|           1|       CPS.TGM:CYCLE|
|           46955|1529104325500000000|           2|       CPS.TGM:CYCLE|
|           46955|1529104411900000000|           9|       CPS.TGM:CYCLE|
|           46955|1529105637100000000|           8|       CPS.TGM:CYCLE|
|           46955|1529106096700000000|          12|       CPS.TGM:CYCLE|
|           46955|1529106246700000000|          18|       CPS.TGM:CYCLE|
|           46955|1529103623500000000|           5|       CPS.TGM:CYCLE|
|           46955|1529103702700000000|           8|       CPS.TGM:CYCLE|
+----------------+-------------------+------------+--------------------+
only showing top 10 rows

Multiple variables can be combined into a single query builder as in the example below. As well it is possible to use simultaneously variable() and variableLike() methods:

Python

data2 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-07-20 13:38:00.000').endTime('2018-07-20 13:39:00.000') \
    .variable('LHC.BOFSU:TUNE_B1_H') \
    .variable('LHC.BOFSU:TUNE_B1_V') \
    .build()

Java

Dataset<Row> data2 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-07-20 13:38:00.000").endTime("2018-07-20 13:39:00.000")
        .variable("LHC.BOFSU:TUNE_B1_H")
        .variable("LHC.BOFSU:TUNE_B1_V")
        .build();

Scala

val data2 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-07-20 13:38:00.000").endTime("2018-07-20 13:39:00.000").
    variable("LHC.BOFSU:TUNE_B1_H").
    variable("LHC.BOFSU:TUNE_B1_V").
    build()

Combining of different variables may lead to ''IncompatibleSchemaPromotionException' as demonstrated in the snippet below when scalars are being extracted together with vectors:

Python

data3 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2017-01-20 03:30:00.000').endTime('2020-08-20 13:45:00.000') \
    .variableLike('LHC.BOFSU:TUNE_B1_%') \
    .variable('LHC.BOFSU:OFC_DEFLECT_H') \
    .build()

Java

Dataset<Row> data3 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-07-20 13:38:00.000").endTime("2018-07-20 13:39:00.000")
        .variableLike("LHC.BOFSU:TUNE_B1_%")
        .variable("LHC.BOFSU:OFC_DEFLECT_H")
        .build();

Scala

val data3 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-07-20 13:38:00.000").endTime("2018-07-20 13:39:00.000").
    variableLike("LHC.BOFSU:TUNE_B1_%").
    variable("LHC.BOFSU:OFC_DEFLECT_H").
    build()

Please refer to multiple schemas for more information.

Usage of wildcard characters in query builders

This section describes special characters that can be used in different query builders.

For each query builder methods such as: keyValue, keyValues, device, property, parameter and variable Data Access API provides equivalent "Like" methods allowing usage of wildcard characters: keyValueLike, keyValuesLike, deviceLike, propertyLike, parameterLike and variableLike.

Only 2 types of wildcards characters are allowed:

% The percent wildcard specifies that any characters can appear in multiple positions represented by the wildcard.
_ The underscore wildcard specifies a single position in which any character can occur.

"Non Like" and "Like" methods can be combined in the same query builder like in the example below:

Python

df = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValue('device', 'LHC.LUMISCAN.DATA') \
    .keyValueLike('property', 'CrossingAngleIP%') \
    .build()

Java

Dataset<Row> df = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity().keyValue("device", "LHC.LUMISCAN.DATA")
        .keyValueLike("property", "CrossingAngleIP%")
        .build();

Scala

val df =  DataQuery.builder(spark).byEntities().system('CMW').
    startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000').
    entity().keyValue("device", "LHC.LUMISCAN.DATA").
    keyValueLike("property", "CrossingAngleIP%").
    build()