Building CMW specific queries
Warning
Before running examples below pls make sure that the boundle configuration file conf/spark-defaults.conf has been updated with your kerberos settings.
More info at: Data Access User Guide
Basic queries constructed using DevicePropertyDataQuery builder
Hint
You must obtain Kerberos token before running ./bin/spark-shell
More info at: Data Access User Guide
Import libraries required for our example
import java.time.format._
import java.time._
import cern.nxcals.api.extraction.data.builders._
import TimeUtils
Define time window and retrive datasets:
println("Working on 1 day of data:")
val start = Instant.parse("2018-05-01T00:00:00.000Z")
val end = Instant.parse("2018-05-02T00:00:00.000Z")
val tgmData = DevicePropertyDataQuery.builder(spark).system("CMW").startTime(start).endTime(end).entity().parameter("CPS.TGM/FULL-TELEGRAM.STRC").build()
val data = DevicePropertyDataQuery.builder(spark).system("CMW").startTime(start).endTime(end).entity().parameter("FTN.QFO415S/Acquisition").build()
Calculating some basic information about current:
data.describe("current").show
println("Showing current sum for all *EAST* users:")
data.where("selector like '%EAST%'").agg(sum('current)).show
println("Showing current sum for destination *TOF* using join:")
val tgmFiltered = tgmData.where("DEST like '%TOF%'")
tgmFiltered.count
tgmFiltered.join(data, "cyclestamp").agg(sum('current)).show
Showing access to nested types, here array elements:
tgmData.select("SPCON.elements").as[Array[String]].show
println("Counting cycles")
tgmData.groupBy("USER").count().show
println("Showing max and min dates for TGM data")
tgmData.agg(min('cyclestamp), max('cyclestamp)).selectExpr("`min(cyclestamp)` as min","`max(cyclestamp)` as max").withColumn("mindate",from_unixtime(expr("min/1000000000"))).withColumn("maxdate", from_unixtime(expr("max/1000000000"))).show