Data Extraction API

In order to facilitate NXCALS data retrieval a Data Access API has been implemented in Java. It should be considered as a principal method of accessing logging data and it serves as a base for implementation of other data retrieval mechanisms which are proposed to users.

Those include Python 3 library written directly on top of the Java API as a thin set of native python units that are internally using Py4J - a bridge between Python and Java. Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a JVM.

Python language is considered as a first-class citizen in the Spark world because of the availability of multitude of libraries including the ones for visualizing data. Being more analytical oriented it is a great choice for building data science applications. Our python interface has been made available directly from python, via PySpark shell (through NXCALS bundle) and SWAN web interface.

There is yet another possibility of accessing NXCALS API and it could be done through Scala. Since Scala operates on the same JVM and provides language interoperability with the Java, the API becomes automatically available in that language as well (unfortunately NXCALS team does not support Scala as a language).

It is worth to underline that thanks to our approach (reusing the object from the same shared JVM) we have achieved homogeneous functionality across the Java, Python and Scala APIs.

NXCALS Data Access API itself consist of two query builders: DataQuery and DevicePropertyDataQuery returning result dataset as an output. It always expects specification of time window as a input altogether with information which allows identifying data signal such as: system, generic key values pairs, device/property (in case of CMW system) or variable name (for backward compatibility). More details about exact syntax with some examples can be found below.

Note

The reference presented below is language independent. Concrete examples are given in Python, Java and Scala for clarification.

DataQuery for key-values

Builder responsible for querying generic data using key/value pairs.

DataQuery.builder(spark).byEntities()
    .system(systemString)                                  # "SYSTEM"
        # Obligatory time range block
        .atTime(timeUtcString)                             # "YYYY-MM-DD HH24:MI:SS.SSS"
        .startTime(startTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"
            .duration(duration)                            # NUMBER
            .endTime(endTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"
        # Optional data context block
        .fieldAliases(fieldAliasesMap)                     # {"FIELD1": "FIELD-ALIAS1", "FIELD2": "FIELD-ALIAS2"}
        # Obligatory entity block which can be repeated
        .entity()
            .keyValue(keyString, valueString)              # "KEY", "VALUE"
            .keyValueLike(keyString, valueString)          # "KEY", "VALUE-WITH-WILDCARDS"
            .keyValues(keyValuesMap)                       # {"KEY1": "VALUE1", "KEY2": "VALUE2"}
            .keyValuesLike(keyString, valueString)         # {"KEY1", "VALUE1-WITH-WILDCARDS", "KEY2", "VALUE2-WITH-WILDCARDS"}
                .entity()
                .build()

Examples:

Python

from nxcals.api.extraction.data.builders import *

df1 = DataQuery.builder(spark).byEntities().system('WINCCOA') \
    .startTime('2018-06-15 00:00:00.000').endTime('2018-06-17 00:00:00.000') \
    .entity().keyValue('variable_name', 'MB.C16L2:U_HDS_3') \
    .build()

df2 = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValue('device', 'LHC.LUMISCAN.DATA').keyValue('property', 'CrossingAngleIP1') \
    .build()

df3 = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValues({'device': 'LHC.LUMISCAN.DATA', 'property': 'CrossingAngleIP1'}) \
    .build()

df4 = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValuesLike({'device': 'LHC.LUMISCAN.DATA', 'property': 'CrossingAngleIP%'}) \
    .build()

Java

import cern.nxcals.api.extraction.data.builder.DataQuery;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import java.util.HashMap;
import java.util.Map;

Dataset<Row> df1 = DataQuery.builder(spark).byEntities().system("WINCCOA")
        .startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000")
        .entity().keyValue("variable_name", "MB.C16L2:U_HDS_3")
        .build();

Dataset<Row> df2 = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity().keyValue("device", "LHC.LUMISCAN.DATA").keyValue("property", "CrossingAngleIP1")
        .build();

Map<String, Object> keyValues = new HashMap<String, Object>();
keyValues.put("device", "LHC.LUMISCAN.DATA");
keyValues.put("property", "CrossingAngleIP1");

Dataset<Row> df3 = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity()
        .keyValues(keyValues)
        .build();

Map<String, Object> keyValuesLike = new HashMap<String, Object>();
keyValuesLike.put("device", "LHC.LUMISCAN.DATA");
keyValuesLike.put("property", "CrossingAngleIP%");

Dataset<Row> df4 = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity().keyValuesLike(keyValuesLike)
        .build();

Scala

import cern.nxcals.api.extraction.data.builders._
import scala.collection.JavaConversions._

val df1 = DataQuery.builder(spark).byEntities().system("WINCCOA").
    startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000").
    entity().keyValue("variable_name", "MB.C16L2:U_HDS_3").
    build()

val df2 = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    entity().keyValue("device", "LHC.LUMISCAN.DATA").keyValue("property", "CrossingAngleIP1").
    build()

val df3 = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    entity().keyValues(mapAsJavaMap(Map("device" -> "LHC.LUMISCAN.DATA", "property" -> "CrossingAngleIP1"))).
    build()    

val df4 = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    entity().keyValuesLike(mapAsJavaMap(Map("device" -> "LHC.LUMISCAN.DATA", "property" -> "CrossingAngleIP%"))).
    build()

DataQuery for variables

Builder responsible for querying using variable names

DataQuery.builder(spark).byVariables()
    .system(systemString)                                  # "SYSTEM"
            # Obligatory time range block
            .atTime(timeUtcString)                         # "YYYY-MM-DD HH24:MI:SS.SSS"
            .startTime(startTimeUtcString)                 # "YYYY-MM-DD HH24:MI:SS.SSS"
                .duration(Duration)                        # NUMBER
                .endTime(endTimeUtcString)                 # "YYYY-MM-DD HH24:MI:SS.SSS"
            .variable(variableNameString)                  # "VARIABLE-NAME"
            .variableLike(variableNameString)              # "VARIABLE-NAME-WITH-WILDCARDS"          
            # Obligatory variable block
            .build()

Examples:

Python

from nxcals.api.extraction.data.builders import *

df1 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .variable('LTB.BCT60:INTENSITY') \
    .build()

df2 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .variableLike('LTB.BCT%:INTENSITY') \
    .build()

df3 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .variableLike('LTB.BCT50%:INTENSITY') \
    .variable('LTB.BCT60:INTENSITY') \
    .build()

Java

import DataQuery;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

Dataset<Row> df1 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .variable("LTB.BCT60:INTENSITY")
        .build();

Dataset<Row> df2 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .variableLike("LTB.BCT%:INTENSITY")
        .build();

Dataset<Row> df3 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .variableLike("LTB.BCT%:INTENSITY")
        .variable("LTB.BCT60:INTENSITY")
        .build();

Scala

import cern.nxcals.api.extraction.data.builders._

val df1 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    variable("LTB.BCT60:INTENSITY").
    build()

val df2 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    variableLike("LTB.BCT%:INTENSITY").
    build()

val df3 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    variableLike("LTB.BCT%:INTENSITY").
    variable("LTB.BCT60:INTENSITY").
    build()

DevicePropertyDataQuery builder

Builder responsible for querying using device/property pairs.

DevicePropertyDataQuery.builder(spark)
    .system(systemString)                                  # "SYSTEM"
        # Obligatory time range block
        .atTime(timeUtcString)                             # "YYYY-MM-DD HH24:MI:SS.SSS"
        .startTime(startTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"          
            .duration(duration)                            # NUMBER
            .endTime(endTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"  
        # Optional data context block  
        .fieldAliases(fieldAliasesMap)                     # {"FIELD1": "FIELD-ALIAS1", "FIELD2": "FIELD-ALIAS2"}
        # Obligatory entity block which can be repeated
        .entity()                             
            .device(deviceString)                          # "DEVICE-NAME"
            .deviceLike(deviceString)                      # "DEVICE-NAME-WITH-WILDCARDS"
                .property(propertyString)                  # "PROPERTY-NAME"
                .propertyLike(propertyString)              # "PROPERTY-NAME-WITH-WILDCARDS"
                    .entity()
                    .build()
            .parameter(parameterString)                    # "VARIABLE-NAME/PROPERTY-NAME"
            .parameterLike(parameterString)                # "VARIABLE-NAME/PROPERTY-NAME-WITH-WILDCARDS"
                .entity()
                .build()

Examples:

Python

from nxcals.api.extraction.data.builders import *

df1 = DevicePropertyDataQuery.builder(spark) \
    .system('CMW').startTime('2017-08-29 00:00:00.000').duration(10000000000) \
    .entity().parameter('RADMON.PS-10/ExpertMonitoringAcquisition') \
    .build()

df2 = DevicePropertyDataQuery.builder(spark) \
    .system('CMW').startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .fieldAliases({'CURRENT 18V': ['current_18V', 'voltage_18V']}) \
    .entity().device('RADMON.PS-1').property('ExpertMonitoringAcquisition') \
    .entity().parameter('RADMON.PS-10/ExpertMonitoringAcquisition') \
    .build()

df3 = DevicePropertyDataQuery.builder(spark) \
    .system('CMW').startTime('2017-08-29 00:00:00.000').duration(10000000000) \
    .entity().parameterLike('RADMON.PS-%/ExpertMonitoringAcquisition') \
    .build()

Java

import cern.nxcals.api.extraction.data.builder.DevicePropertyDataQuery;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

Dataset<Row> df1 = DevicePropertyDataQuery.builder(spark)
        .system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l)
        .entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition")
        .build();

List<String> fieldAliasesList = new ArrayList<>();
fieldAliasesList.add("current_18V");
fieldAliasesList.add("voltage_18V");

Map<String, List<String>> fieldAliases = new HashMap<>();
fieldAliases.put("CURRENT 18V", fieldAliasesList);

Dataset<Row> df2 = DevicePropertyDataQuery.builder(spark)
        .system("CMW").startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .fieldAliases(fieldAliases)
        .entity().device("RADMON.PS-1").property("ExpertMonitoringAcquisition")
        .entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition")
        .build();
df2.printSchema();

Dataset<Row> df3 = DevicePropertyDataQuery.builder(spark)
        .system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l)
        .entity().parameterLike("RADMON.PS-%/ExpertMonitoringAcquisition")
        .build();

Scala

import cern.nxcals.api.extraction.data.builders._
import scala.collection.JavaConverters._

val df1 = DevicePropertyDataQuery.builder(spark).
    system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l).
    entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition").
    build()

val df2 = DevicePropertyDataQuery.builder(spark).
    system("CMW").startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    fieldAliases(mapAsJavaMap(Map("CURRENT 18V" -> seqAsJavaList(Seq("current_18V", "voltage_18V"))))).
    entity().device("RADMON.PS-1").property("ExpertMonitoringAcquisition").
    entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition").
    build()

val df3 = DevicePropertyDataQuery.builder(spark).
    system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l).
    entity().parameterLike("RADMON.PS-%/ExpertMonitoringAcquisition").
    build()