Skip to content

Building CMW specific queries

Warning

Before running examples below pls make sure that the boundle configuration file conf/spark-defaults.conf has been updated with your kerberos settings.

More info at: Data Access User Guide

Basic queries constructed using DevicePropertyDataQuery builder

Hint

You must obtain Kerberos token before running ./bin/spark-shell

More info at: Data Access User Guide

Import libraries required for our example

import java.time.format._
import java.time._
import cern.nxcals.api.extraction.data.builders._
import TimeUtils

Define time window and retrive datasets:

println("Working on 1 day of data:")

val start = Instant.parse("2018-05-01T00:00:00.000Z")
val end = Instant.parse("2018-05-02T00:00:00.000Z")

val tgmData = DevicePropertyDataQuery.builder(spark).system("CMW").startTime(start).endTime(end).entity().parameter("CPS.TGM/FULL-TELEGRAM.STRC").build()
val data = DevicePropertyDataQuery.builder(spark).system("CMW").startTime(start).endTime(end).entity().parameter("FTN.QFO415S/Acquisition").build()

Calculating some basic information about current:

data.describe("current").show

println("Showing current sum for all *EAST* users:")

data.where("selector like '%EAST%'").agg(sum('current)).show

println("Showing current sum for destination *TOF* using join:")

val tgmFiltered = tgmData.where("DEST like '%TOF%'")

tgmFiltered.count

tgmFiltered.join(data, "cyclestamp").agg(sum('current)).show

Showing access to nested types, here array elements:

tgmData.select("SPCON.elements").as[Array[String]].show

println("Counting cycles")

tgmData.groupBy("USER").count().show

println("Showing max and min dates for TGM data")
tgmData.agg(min('cyclestamp), max('cyclestamp)).selectExpr("`min(cyclestamp)` as min","`max(cyclestamp)` as max").withColumn("mindate",from_unixtime(expr("min/1000000000"))).withColumn("maxdate", from_unixtime(expr("max/1000000000"))).show