Skip to content

Data Extraction API

In order to facilitate NXCALS data retrieval a Data Access API has been implemented in Java. It should be considered as a principal method of accessing logging data and it serves as a base for implementation of other data retrieval mechanisms which are proposed to users.

Those include Python 3 library written directly on top of the Java API as a thin set of native python units that are internally using Py4J - a bridge between Python and Java. Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a JVM.

Python language is considered as a first-class citizen in the Spark world because of the availability of multitude of libraries including the ones for visualizing data. Being more analytical oriented it is a great choice for building data science applications. Our python interface has been made available directly from python, via PySpark shell (through NXCALS bundle) and SWAN web interface.

There is yet another possibility of accessing NXCALS API and it could be done through Scala. Since Scala operates on the same JVM and provides language interoperability with the Java, the API becomes automatically available in that language as well (unfortunately NXCALS team does not support Scala as a language).

It is worth to underline that thanks to our approach (reusing the object from the same shared JVM) we have achieved homogeneous functionality across the Java, Python and Scala APIs.

NXCALS Data Access API itself consist of two query builders: DataQuery and DevicePropertyDataQuery returning result dataset as an output. It always expects specification of time window as a input altogether with information which allows identifying data signal such as: system, generic key values pairs, device/property (in case of CMW system) or variable name (for backward compatibility). More details about exact syntax with some examples can be found below.

Note

The reference presented below is language independent. Concrete examples are given in Python, Java and Scala for clarification.

DataQuery for key-values

Builder responsible for querying generic data using key/value pairs.

DataQuery.builder(spark).byEntities()
    .system(systemString)                                  # "SYSTEM"
        # Obligatory time range block
        .atTime(timeUtcString)                             # "YYYY-MM-DD HH24:MI:SS.SSS"
        .startTime(startTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"
            .duration(duration)                            # NUMBER
            .endTime(endTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"
        # Optional data context block
        .fieldAliases(fieldAliasesMap)                     # {"FIELD1": "FIELD-ALIAS1", "FIELD2": "FIELD-ALIAS2"}
        # Obligatory entity block which can be repeated
        .entity()
            .keyValue(keyString, valueString)              # "KEY", "VALUE"
            .keyValueLike(keyString, valueString)          # "KEY", "VALUE-WITH-WILDCARDS"
            .keyValues(keyValuesMap)                       # {"KEY1": "VALUE1", "KEY2": "VALUE2"}
            .keyValuesLike(keyString, valueString)         # {"KEY1", "VALUE1-WITH-WILDCARDS", "KEY2", "VALUE2-WITH-WILDCARDS"}
                .entity()
                .build()

Examples:

from nxcals.api.extraction.data.builders import *

df1 = DataQuery.builder(spark).byEntities().system('WINCCOA') \
    .startTime('2018-06-15 00:00:00.000').endTime('2018-06-17 00:00:00.000') \
    .entity().keyValue('variable_name', 'MB.C16L2:U_HDS_3') \
    .build()

df2 = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValue('device', 'LHC.LUMISCAN.DATA').keyValue('property', 'CrossingAngleIP1') \
    .build()

df3 = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValues({'device': 'LHC.LUMISCAN.DATA', 'property': 'CrossingAngleIP1'}) \
    .build()

df4 = DataQuery.builder(spark).byEntities().system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .entity().keyValuesLike({'device': 'LHC.LUMISCAN.DATA', 'property': 'CrossingAngleIP%'}) \
    .build()
import cern.nxcals.api.extraction.data.builder.DataQuery;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import java.util.HashMap;
import java.util.Map;

Dataset<Row> df1 = DataQuery.builder(spark).byEntities().system("WINCCOA")
        .startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000")
        .entity().keyValue("variable_name", "MB.C16L2:U_HDS_3")
        .build();

Dataset<Row> df2 = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity().keyValue("device", "LHC.LUMISCAN.DATA").keyValue("property", "CrossingAngleIP1")
        .build();

Map<String, Object> keyValues = new HashMap<String, Object>();
keyValues.put("device", "LHC.LUMISCAN.DATA");
keyValues.put("property", "CrossingAngleIP1");

Dataset<Row> df3 = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity()
        .keyValues(keyValues)
        .build();

Map<String, Object> keyValuesLike = new HashMap<String, Object>();
keyValuesLike.put("device", "LHC.LUMISCAN.DATA");
keyValuesLike.put("property", "CrossingAngleIP%");

Dataset<Row> df4 = DataQuery.builder(spark).byEntities().system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .entity().keyValuesLike(keyValuesLike)
        .build();
import cern.nxcals.api.extraction.data.builders._
import scala.collection.JavaConversions._

val df1 = DataQuery.builder(spark).byEntities().system("WINCCOA").
    startTime("2018-06-15 00:00:00.000").endTime("2018-06-17 00:00:00.000").
    entity().keyValue("variable_name", "MB.C16L2:U_HDS_3").
    build()

val df2 = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    entity().keyValue("device", "LHC.LUMISCAN.DATA").keyValue("property", "CrossingAngleIP1").
    build()

val df3 = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    entity().keyValues(mapAsJavaMap(Map("device" -> "LHC.LUMISCAN.DATA", "property" -> "CrossingAngleIP1"))).
    build()    

val df4 = DataQuery.builder(spark).byEntities().system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    entity().keyValuesLike(mapAsJavaMap(Map("device" -> "LHC.LUMISCAN.DATA", "property" -> "CrossingAngleIP%"))).
    build()

DataQuery for variables

Builder responsible for querying using variable names

DataQuery.builder(spark).byVariables()
    .system(systemString)                                  # "SYSTEM"
            # Obligatory time range block
            .atTime(timeUtcString)                         # "YYYY-MM-DD HH24:MI:SS.SSS"
            .startTime(startTimeUtcString)                 # "YYYY-MM-DD HH24:MI:SS.SSS"
                .duration(Duration)                        # NUMBER
                .endTime(endTimeUtcString)                 # "YYYY-MM-DD HH24:MI:SS.SSS"
            .variable(variableNameString)                  # "VARIABLE-NAME"
            .variableLike(variableNameString)              # "VARIABLE-NAME-WITH-WILDCARDS"          
            # Obligatory variable block
            .build()

Examples:

from nxcals.api.extraction.data.builders import *

df1 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .variable('LTB.BCT60:INTENSITY') \
    .build()

df2 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .variableLike('LTB.BCT%:INTENSITY') \
    .build()

df3 = DataQuery.builder(spark).byVariables() \
    .system('CMW') \
    .startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .variableLike('LTB.BCT50%:INTENSITY') \
    .variable('LTB.BCT60:INTENSITY') \
    .build()
import DataQuery;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

Dataset<Row> df1 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .variable("LTB.BCT60:INTENSITY")
        .build();

Dataset<Row> df2 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .variableLike("LTB.BCT%:INTENSITY")
        .build();

Dataset<Row> df3 = DataQuery.builder(spark).byVariables()
        .system("CMW")
        .startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .variableLike("LTB.BCT%:INTENSITY")
        .variable("LTB.BCT60:INTENSITY")
        .build();
import cern.nxcals.api.extraction.data.builders._

val df1 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    variable("LTB.BCT60:INTENSITY").
    build()

val df2 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    variableLike("LTB.BCT%:INTENSITY").
    build()

val df3 = DataQuery.builder(spark).byVariables().
    system("CMW").
    startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    variableLike("LTB.BCT%:INTENSITY").
    variable("LTB.BCT60:INTENSITY").
    build()    

DevicePropertyDataQuery builder

Builder responsible for querying using device/property pairs.

DevicePropertyDataQuery.builder(spark)
    .system(systemString)                                  # "SYSTEM"
        # Obligatory time range block
        .atTime(timeUtcString)                             # "YYYY-MM-DD HH24:MI:SS.SSS"
        .startTime(startTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"          
            .duration(duration)                            # NUMBER
            .endTime(endTimeUtcString)                     # "YYYY-MM-DD HH24:MI:SS.SSS"  
        # Optional data context block  
        .fieldAliases(fieldAliasesMap)                     # {"FIELD1": "FIELD-ALIAS1", "FIELD2": "FIELD-ALIAS2"}
        # Obligatory entity block which can be repeated
        .entity()                             
            .device(deviceString)                          # "DEVICE-NAME"
            .deviceLike(deviceString)                      # "DEVICE-NAME-WITH-WILDCARDS"
                .property(propertyString)                  # "PROPERTY-NAME"
                .propertyLike(propertyString)              # "PROPERTY-NAME-WITH-WILDCARDS"
                    .entity()
                    .build()
            .parameter(parameterString)                    # "VARIABLE-NAME/PROPERTY-NAME"
            .parameterLike(parameterString)                # "VARIABLE-NAME/PROPERTY-NAME-WITH-WILDCARDS"
                .entity()
                .build()

Examples:

from nxcals.api.extraction.data.builders import *

df1 = DevicePropertyDataQuery.builder(spark) \
    .system('CMW').startTime('2017-08-29 00:00:00.000').duration(10000000000) \
    .entity().parameter('RADMON.PS-10/ExpertMonitoringAcquisition') \
    .build()

df2 = DevicePropertyDataQuery.builder(spark) \
    .system('CMW').startTime('2018-04-29 00:00:00.000').endTime('2018-04-30 00:00:00.000') \
    .fieldAliases({'CURRENT 18V': ['current_18V', 'voltage_18V']}) \
    .entity().device('RADMON.PS-1').property('ExpertMonitoringAcquisition') \
    .entity().parameter('RADMON.PS-10/ExpertMonitoringAcquisition') \
    .build()

df3 = DevicePropertyDataQuery.builder(spark) \
    .system('CMW').startTime('2017-08-29 00:00:00.000').duration(10000000000) \
    .entity().parameterLike('RADMON.PS-%/ExpertMonitoringAcquisition') \
    .build()
import cern.nxcals.api.extraction.data.builder.DevicePropertyDataQuery;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

Dataset<Row> df1 = DevicePropertyDataQuery.builder(spark)
        .system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l)
        .entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition")
        .build();

List<String> fieldAliasesList = new ArrayList<>();
fieldAliasesList.add("current_18V");
fieldAliasesList.add("voltage_18V");

Map<String, List<String>> fieldAliases = new HashMap<>();
fieldAliases.put("CURRENT 18V", fieldAliasesList);

Dataset<Row> df2 = DevicePropertyDataQuery.builder(spark)
        .system("CMW").startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000")
        .fieldAliases(fieldAliases)
        .entity().device("RADMON.PS-1").property("ExpertMonitoringAcquisition")
        .entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition")
        .build();
df2.printSchema();

Dataset<Row> df3 = DevicePropertyDataQuery.builder(spark)
        .system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l)
        .entity().parameterLike("RADMON.PS-%/ExpertMonitoringAcquisition")
        .build();
import cern.nxcals.api.extraction.data.builders._
import scala.collection.JavaConverters._

val df1 = DevicePropertyDataQuery.builder(spark).
    system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l).
    entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition").
    build()

val df2 = DevicePropertyDataQuery.builder(spark).
    system("CMW").startTime("2018-04-29 00:00:00.000").endTime("2018-04-30 00:00:00.000").
    fieldAliases(mapAsJavaMap(Map("CURRENT 18V" -> seqAsJavaList(Seq("current_18V", "voltage_18V"))))).
    entity().device("RADMON.PS-1").property("ExpertMonitoringAcquisition").
    entity().parameter("RADMON.PS-10/ExpertMonitoringAcquisition").
    build()

val df3 = DevicePropertyDataQuery.builder(spark).
    system("CMW").startTime("2017-08-29 00:00:00.000").duration(10000000000l).
    entity().parameterLike("RADMON.PS-%/ExpertMonitoringAcquisition").
    build()