Java APIs access from Python

Introduction

To be able to use NXCALS public APIs with another language such as Python, most probably, a library written on that language must be provided (native approach).

NXCALS provides a package that can be used in python contexts for data extraction (check NXCALS python package). This package is a very useful and straightforward solution for data extraction with Python, however it is exposing only Extraction API.

For other APIs such as: CERN Extraction API, Meta data API, Ingestion API and Backport API, one would have to write a native API clients or port the already existing Java implementation. That approach however, could possibly lead to code duplication and implementation discrepancies. At the same time Python offers two packages (Py4J and JPype which allow accessing Java objects directly from Python code.

Accessing Java APIs using Py4J

Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine. It requires a JVM which must be started before execution of the Python code (it does not start JVM itself) and a presence of py4j python module. That could be achieved by setting up a JavaGateway process as described here and installing py4j module through: pip install py4j

Much more convenient way of working with Py4J is through making use of NXCALS package in which py4j module is (pre)installed and JVM can be accessed directly from the Spark session (already available in PySpark tool or which can be created using provided session builders in Standalone Python application). In both cases access to JVM is done through a special variable: spark._jvm

Examples of using Py4J from PySpark

Getting variable name:

Python

variableService = spark._jvm.cern.nxcals.api.extraction.metadata.ServiceClientFactory.createVariableService()
var = variableService.findOne(variables.suchThat().variableName().eq('HX:BMODE')).get()
print(var.getVariableName())

Getting information about specific fill:

Python

# Using an existing SparkSession provided by PySpark
fillservice = spark._jvm.cern.nxcals.api.custom.service.Services.newInstance().fillService()
fill=fillservice.findFill(3000)
print(fill)

Extracting variable data within time range and using a specific lookup strategy:

Python

cern_nxcals_api = spark._jvm.cern.nxcals.api

variableService = cern_nxcals_api.extraction.metadata.ServiceClientFactory.createVariableService()
extractionService = cern_nxcals_api.custom.service.Services.newInstance().extractionService()



myVariable = variableService.findOne(cern_nxcals_api.extraction.metadata.queries.Variables.suchThat().variableName().eq("CPS.TGM:CYCLE"))

if myVariable.isEmpty():
    raise ValueError("Could not obtain variable from service")

startTime = cern_nxcals_api.utils.TimeUtils.getInstantFromString("2020-04-25 00:00:00.000000000")
endTime = cern_nxcals_api.utils.TimeUtils.getInstantFromString("2020-04-26 00:00:00.000000000")

properties = cern_nxcals_api.custom.service.extraction.ExtractionProperties.builder().timeWindow(startTime, endTime) \
    .lookupStrategy(cern_nxcals_api.custom.service.extraction.LookupStrategy.LAST_BEFORE_START_IF_EMPTY).build()

dataset = extractionService.getData(myVariable.get(), properties)

print(dataset.count())
dataset.show()

Example of using Py4j from standalone Python application (requires Spark session creation)

Python

from nxcals import spark_session_builder

spark = spark_session_builder.get_or_create(app_name='spark-basic')
_cern_nxcals_api = spark._jvm.cern.nxcals.api

fillService = _cern_nxcals_api.custom.service.Services.newInstance().fillService()

fills = fillService.findFills(_cern_nxcals_api.utils.TimeUtils.getInstantFromString("2018-04-25 00:00:00.000000000"),
                              _cern_nxcals_api.utils.TimeUtils.getInstantFromString("2018-04-28 00:00:00.000000000"))

Accessing Java APIs using JPype

JPype Python module is interfacing JVM at the native level. It allows Python to make use of Java specific libraries.

Installation steps

First JPype module has to be installed. This can be achieved by using the python pip module:

pip install JPype1

JPype requires jars so it can create a JVM and include them to the classpath. The imported libraries would be automatically exposed with a pythonic way to our process and the general look and feel of the library structure would be as it was natively developed with python.

Hint

One way of obtaining NXCALS API jars and their dependencies is through the installation of NXCALS package. All the neccesary jars can be found in the newly created venv directories:

    venv/nxcals-bundle/jars
    venv/nxcals-bundle/nxcals_jars

Start JVM and access services

Assuming that the required jars can be accessed from the previously installed NXCALS package (see the hint above), we can execute the following code in order to access data from metadata storage:

Python

import jpype

# convertStrings=True is used for the proper conversion of java.lang.String to Python string literals
jpype.startJVM(classpath=['/your_nxcals_package_location/venv/nxcals-bundle/jars/*',
                          '/your_nxcals_package_location/venv/nxcals-bundle/nxcals_jars/*'], convertStrings=True)

# point metadata client to NXCALS PRO services
from java.lang import System
System.setProperty("service.url", "https://cs-ccr-nxcals5.cern.ch:19093,https://cs-ccr-nxcals5.cern.ch:19094,https://cs-ccr-nxcals6.cern.ch:19093,https://cs-ccr-nxcals6.cern.ch:19094,https://cs-ccr-nxcals7.cern.ch:19093,https://cs-ccr-nxcals7.cern.ch:19094,https://cs-ccr-nxcals8.cern.ch:19093,https://cs-ccr-nxcals8.cern.ch:19094");

from cern.nxcals.api.extraction.metadata import ServiceClientFactory
vs = ServiceClientFactory.createVariableService()

from cern.nxcals.api.extraction.metadata.queries import Variables
var = vs.findOne(Variables.suchThat().variableName().eq("HX:BMODE")).get()

var.getVariableName()
var.getId()

Load java packages under a package alias

Please take note

In some rare cases CERN root package on python namespace might be already used (or clashing packages) the classpath packages might not be directly loaded on the JPype process. Thus, while trying to import a class from java (ex. ServiceClientFactory from metadata-api), might lead to module not loaded from classpath exception:

>>> from cern.nxcals.api.extraction.metadata import ServiceClientFactory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cern.nxcals.api.extraction.metadata'

The solution for that issue is to use the advanced Jpype import options and register the java package(s) under an alias that we're sure that is not clashing with already available modules on python path. Therefore, we can eventually load and use the above factory by performing the following steps:

Python

# actual packages: cern.nxcals
# expose as alias: nxcals_meta
jpype.imports.registerDomain('nxcals_meta', alias='cern.nxcals')

# now we can normally import and use exposed units under package alias
from nxcals_meta.api.extraction.metadata import ServiceClientFactory
vs = ServiceClientFactory.createVariableService()

from nxcals_meta.api.extraction.metadata.queries import Variables
var = vs.findOne(Variables.suchThat().variableName().eq("HX:BMODE")).get()

var.getVariableName()

RQSL queries

Important

Please note that in case of working with certain RSQL queries some conflicts may occur related to reserved Python keywords such as: and, or, in. Both JPype and Py4J address those issues in a different way. JPype introduces corresponding methods with an underscore as a suffix e.i.: and_(), or_(), in_(). At the same time Py4J relies on a special syntax based on usage of getattr() method.

Below one can find examples illustrating a difference between Py4J and JPype when it comes to special handling of RSQL queries with reserved keywords. Keep in mind that for simplicity JPype is used for all other code snippets provided in the documentation.

Selecting a variable in CMW system:

Py4J

_metadata = spark._jvm.cern.nxcals.api.extraction.metadata
variableService = _metadata.ServiceClientFactory.createVariableService()

Variables = _metadata.queries.Variables

variable = variableService \
    .findOne(getattr(Variables.suchThat().systemName().eq("CMW"), 'and')().variableName().eq("SPS:NXCALS_FUNDAMENTAL")) \
    .get()

variable.getDescription()

JPype

variable = variableService \
    .findOne(Variables.suchThat().systemName().eq("CMW").and_().variableName().eq("SPS:NXCALS_FUNDAMENTAL")) \
    .get()
variable.getDescription()

Retrieving 2 variables (please note that unfortunately Py4J requires some conditions nesting whereas syntax in JPype is more straightforward):

Py4J

_metadata = spark._jvm.cern.nxcals.api.extraction.metadata
variableService = _metadata.ServiceClientFactory.createVariableService()

Variables = _metadata.queries.Variables

variables = variableService \
    .findAll(getattr( \
        getattr(Variables.suchThat().variableName().eq("SPS:NXCALS_FUNDAMENTAL"), 'or')().variableName().eq("CPS:NXCALS_FUNDAMENTAL"), \
        'and')().systemName().eq("CMW"))

len(variables)

JPype

variables = variableService.findOne(Variables.suchThat() \
    .systemName().eq("CMW").and_(Variables.suchThat.variableName().eq("SPS:NXCALS_FUNDAMENTAL").or_().variableName().eq("CPS:NXCALS_FUNDAMENTAL")))

Conclusion

Both options: JPype an Py4J are quite powerful and allow bridging Java and Python worlds with a minimal effort.

JPype exposes the JVM objects with a more user-friendly approach (less boilerplate code) and has straightforward way to access static properties and classes.

Py4j may become interesting when there is a need to run the java code on a remote host and access objects from a Python client running elsewhere.

It can be run easily from SWAN notebooks.