Python for Java APIs
Introduction
To be able to use NXCALS public APIs with another language such as Python, most probably, a library written on that language must be provided (native approach).
NXCALS provides a package that can be used in python contexts for data extraction (check NXCALS python package). This package is a very useful and straightforward solution for data extraction with Python, however it is exposing only Extraction API.
For other APIs such as: CERN Extraction API, Meta data API, Ingestion API and Backport API, one would have to write a native API clients or port the already existing Java implementation. That approach however, could possibly lead to code duplication and implementation discrepancies. At the same time Python offers two packages (Py4J and JPype which allow accessing Java objects directly from Python code.
Accessing Java APIs using Py4J
Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine.
It requires a JVM which must be started before execution of the Python code (it does not start JVM itself) and a presence of py4j python module.
That could be achieved by setting up a JavaGateway process as described here and installing py4j module through: pip install py4j
Much more convenient way of working with Py4J is through making use of NXCALS package in which py4j module is (pre)installed and JVM can be accessed
directly from the Spark session (already available in PySpark tool or which can be created using provided session builders in Standalone Python application).
In both cases access to JVM is done through a special variable: spark._jvm
Examples of using Py4J from PySpark
Getting variable name:
variableService = spark._jvm.cern.nxcals.api.extraction.metadata.ServiceClientFactory.createVariableService()
var = variableService.findOne(variables.suchThat().variableName().eq('HX:BMODE')).get()
print(var.getVariableName())
Getting information about specific fill:
# Using an existing SparkSession provided by PySpark
fillservice = spark._jvm.cern.nxcals.api.custom.service.Services.newInstance().fillService()
fill=fillservice.findFill(3000)
print(fill)
Extracting variable data within time range and using a specific lookup strategy:
cern_nxcals_api = spark._jvm.cern.nxcals.api
variableService = cern_nxcals_api.extraction.metadata.ServiceClientFactory.createVariableService()
extractionService = cern_nxcals_api.custom.service.Services.newInstance().extractionService()
myVariable = variableService.findOne(cern_nxcals_api.extraction.metadata.queries.Variables.suchThat().variableName().eq("CPS.TGM:CYCLE"))
if myVariable.isEmpty():
raise ValueError("Could not obtain variable from service")
startTime = cern_nxcals_api.utils.TimeUtils.getInstantFromString("2020-04-25 00:00:00.000000000")
endTime = cern_nxcals_api.utils.TimeUtils.getInstantFromString("2020-04-26 00:00:00.000000000")
properties = cern_nxcals_api.custom.service.extraction.ExtractionProperties.builder().timeWindow(startTime, endTime) \
.lookupStrategy(cern_nxcals_api.custom.service.extraction.LookupStrategy.LAST_BEFORE_START_IF_EMPTY).build()
dataset = extractionService.getData(myVariable.get(), properties)
print(dataset.count())
dataset.show()
Example of using Py4j from standalone Python application (requires Spark session creation)
from nxcals import spark_session_builder
spark = spark_session_builder.get_or_create(app_name='spark-basic')
_cern_nxcals_api = spark._jvm.cern.nxcals.api
fillService = _cern_nxcals_api.custom.service.Services.newInstance().fillService()
fills = fillService.findFills(_cern_nxcals_api.utils.TimeUtils.getInstantFromString("2018-04-25 00:00:00.000000000"),
_cern_nxcals_api.utils.TimeUtils.getInstantFromString("2018-04-28 00:00:00.000000000"))
Accessing Java APIs using JPype
JPype Python module is interfacing JVM at the native level. It allows Python to make use of Java specific libraries.
Installation steps
First JPype module has to be installed. This can be achieved by using the python pip module:
pip install JPype1
JPype requires jars so it can create a JVM and include them to the classpath. The imported libraries would be automatically exposed with a pythonic way to our process and the general look and feel of the library structure would be as it was natively developed with python.
Hint
One way of obtaining NXCALS API jars and their dependencies is through the installation of NXCALS package. All the neccesary jars can be found in the newly created venv directories:
venv/nxcals-bundle/jars
venv/nxcals-bundle/nxcals_jars
Start JVM and access services
Assuming that the required jars can be accessed from the previously installed NXCALS package (see the hint above), we can execute the following code in order to access data from metadata storage:
import jpype
# convertStrings=True is used for the proper conversion of java.lang.String to Python string literals
jpype.startJVM(classpath=['/your_nxcals_package_location/venv/nxcals-bundle/jars/*',
'/your_nxcals_package_location/venv/nxcals-bundle/nxcals_jars/*'], convertStrings=True)
# point metadata client to NXCALS PRO services
from java.lang import System
System.setProperty("service.url", "https://cs-ccr-nxcals5.cern.ch:19093,https://cs-ccr-nxcals5.cern.ch:19094,https://cs-ccr-nxcals6.cern.ch:19093,https://cs-ccr-nxcals6.cern.ch:19094,https://cs-ccr-nxcals7.cern.ch:19093,https://cs-ccr-nxcals7.cern.ch:19094,https://cs-ccr-nxcals8.cern.ch:19093,https://cs-ccr-nxcals8.cern.ch:19094");
from cern.nxcals.api.extraction.metadata import ServiceClientFactory
vs = ServiceClientFactory.createVariableService()
from cern.nxcals.api.extraction.metadata.queries import Variables
var = vs.findOne(Variables.suchThat().variableName().eq("HX:BMODE")).get()
var.getVariableName()
var.getId()
Load java packages under a package alias
Please take note
In some rare cases CERN root package on python namespace might be already used (or clashing packages) the classpath packages might not be directly loaded on the JPype process. Thus, while trying to import a class from java (ex. ServiceClientFactory from metadata-api), might lead to module not loaded from classpath exception:
>>> from cern.nxcals.api.extraction.metadata import ServiceClientFactory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cern.nxcals.api.extraction.metadata'
The solution for that issue is to use the advanced Jpype import options and register the java package(s) under an alias that we're sure that is not clashing with already available modules on python path. Therefore, we can eventually load and use the above factory by performing the following steps:
# actual packages: cern.nxcals
# expose as alias: nxcals_meta
jpype.imports.registerDomain('nxcals_meta', alias='cern.nxcals')
# now we can normally import and use exposed units under package alias
from nxcals_meta.api.extraction.metadata import ServiceClientFactory
vs = ServiceClientFactory.createVariableService()
from nxcals_meta.api.extraction.metadata.queries import Variables
var = vs.findOne(Variables.suchThat().variableName().eq("HX:BMODE")).get()
var.getVariableName()
RQSL queries
Important
Please note that in case of working with certain RSQL queries some conflicts may occur related to reserved Python keywords such as: and, or, in. Both JPype and Py4J address those issues in a different way. JPype introduces corresponding methods with an underscore as a suffix e.i.: and_(), or_(), in_(). At the same time Py4J relies on a special syntax based on usage of getattr() method.
Below one can find examples illustrating a difference between Py4J and JPype when it comes to special handling of RSQL queries with reserved keywords. Keep in mind that for simplicity JPype is used for all other code snippets provided in the documentation.
Selecting a variable in CMW system:
_metadata = spark._jvm.cern.nxcals.api.extraction.metadata
variableService = _metadata.ServiceClientFactory.createVariableService()
Variables = _metadata.queries.Variables
variable = variableService \
.findOne(getattr(Variables.suchThat().systemName().eq("CMW"), 'and')().variableName().eq("SPS:NXCALS_FUNDAMENTAL")) \
.get()
variable.getDescription()
variable = variableService \
.findOne(Variables.suchThat().systemName().eq("CMW").and_().variableName().eq("SPS:NXCALS_FUNDAMENTAL")) \
.get()
variable.getDescription()
Retrieving 2 variables (please note that unfortunately Py4J requires some conditions nesting whereas syntax in JPype is more straightforward):
_metadata = spark._jvm.cern.nxcals.api.extraction.metadata
variableService = _metadata.ServiceClientFactory.createVariableService()
Variables = _metadata.queries.Variables
variables = variableService \
.findAll(getattr( \
getattr(Variables.suchThat().variableName().eq("SPS:NXCALS_FUNDAMENTAL"), 'or')().variableName().eq("CPS:NXCALS_FUNDAMENTAL"), \
'and')().systemName().eq("CMW"))
len(variables)
variables = variableService.findOne(Variables.suchThat() \
.systemName().eq("CMW").and_(Variables.suchThat.variableName().eq("SPS:NXCALS_FUNDAMENTAL").or_().variableName().eq("CPS:NXCALS_FUNDAMENTAL")))
Conclusion
Both options: JPype an Py4J are quite powerful and allow bridging Java and Python worlds with a minimal effort.
JPype exposes the JVM objects with a more user-friendly approach (less boilerplate code) and has straightforward way to access static properties and classes.
Py4j may become interesting when there is a need to run the java code on a remote host and access objects from a Python client running elsewhere.
It can be run easily from SWAN notebooks.