Python for Java APIs

Introduction

To be able to use NXCALS public APIs with another language such as Python, one approach (the so-called "native" approach) is to write a wrapper around the NXCALS Public API in the given language.

For Python, NXCALS provides such a package, namely for data extraction (check NXCALS python package). This provides a very useful and straightforward solution for data extraction with Python. However, it exposes only the Extraction API.

For the other APIs, such as: CERN Extraction API, Extraction API, Metadata API, Ingestion API and Backport API, it would be possible to write a native API clients or port the already existing Java implementation to Python. With that approach, Java code would be replicated in Python, with more code to maintain and the risk that the two implementation diverge.

Luckily, there are two open source packages Py4J and JPype, that make a Java API accessible to the Python world.

Most existing controls projects, such as pyJAPC use JPype. In NXCALS, the preference was given to Py4J, because PySpark, the official package to use Spark from Python, is based on Py4J. Only with Py4J is it possible to seamlessly integrate with the PySpark functionality, e.g. to easily and efficiently extract a Spark DataFrame and to convert it to a Pandas DataFrame.

Below are some examples of using NXCALS through Py4J and through JPype.

Accessing Java APIs using Py4J

Py4J enables Python programs running in a Python interpreter to dynamically access Java objects running in a Java Virtual Machine (JVM). For instance, when a developer runs a PySpark program, the PySpark library automatically also starts up a JVM that runs the "real" Java Spark code, which does most of the work, e.g. interact with a Spark Cluster.

This is also the case when using the NXCALS package in which the py4j module is (pre)installed and the JVM can be accessed directly from the Spark session (already available in PySpark tool or which can be created using provided session builders in Standalone Python application).

There are two approaches to using Py4J:

The first approach is low-level and exposes some internals of Py4J. It uses the so-called 'jvm' property to access Java classes from Python.
The second approach is more high-level. The resulting code looks more natural, because classes are imported and then used as in normal Python. This approach also makes it possible to invoke Java methods like and(), or(), in(), which are valid in Java but reserved keywords in Python. A third advantage is that this approach provides code completion in IDEs such as PyCharm.

Code examples for both approaches are provided in the following sections.

Examples of using the low-level approach of Py4J with an existing PySpark session

Getting variable name:

Python

# Creating spark using spark session builder. On SWAN click in star icon
from nxcals.spark_session_builder import get_or_create
spark = get_or_create()
# Use Py4J created in Spark instance
Variables = spark._jvm.cern.nxcals.api.extraction.metadata.queries.Variables

variableService = spark._jvm.cern.nxcals.api.extraction.metadata.ServiceClientFactory.createVariableService()
var = variableService.findOne(Variables.suchThat().variableName().eq('HX:BMODE')).get()
print(var.getVariableName())

Getting information about specific fill:

Python

# Using an existing SparkSession provided by PySpark
fillservice = spark._jvm.cern.nxcals.api.custom.service.Services.newInstance().fillService()
fill=fillservice.findFill(3000)
print(fill)

Extracting variable data within time range and using a specific lookup strategy:

Python

cern_nxcals_api = spark._jvm.cern.nxcals.api

variableService = cern_nxcals_api.extraction.metadata.ServiceClientFactory.createVariableService()
extractionService = cern_nxcals_api.custom.service.Services.newInstance().extractionService()



myVariable = variableService.findOne(cern_nxcals_api.extraction.metadata.queries.Variables.suchThat().variableName().eq("CPS.TGM:CYCLE"))

if myVariable.isEmpty():
    raise ValueError("Could not obtain variable from service")

startTime = cern_nxcals_api.utils.TimeUtils.getInstantFromString("2020-04-25 00:00:00.000000000")
endTime = cern_nxcals_api.utils.TimeUtils.getInstantFromString("2020-04-26 00:00:00.000000000")

properties = cern_nxcals_api.custom.service.extraction.ExtractionProperties.builder().timeWindow(startTime, endTime) \
    .lookupStrategy(cern_nxcals_api.custom.service.extraction.LookupStrategy.LAST_BEFORE_START_IF_EMPTY).build()

dataset = extractionService.getData(myVariable.get(), properties)

print(dataset.count())
dataset.show()

Examples of using the high-level approach of Py4J with the NXCALS session builder

There is a new package - nxcals-type-stubs, which provides type stubs and simplify using NXCALS Java classes with Py4J. It is installed as dependency of the nxcals package. The following general principles apply (also have a look at the first example code):

The programs use usual Python imports, such as from module import Class.
Imports of the Py4J classes start with py4jgw (which stands for Py4J Gateway), followed by the Java API packages, e.g. from py4jgw.cern.nxcals.api.extraction.metadata import ServiceClientFactory.
The NXCALS classes can then be used with normal syntax, e.g. ServiceClientFactory.createVariableService().

Configuration

Normally, during creation SparkSession with:

from nxcals.spark_session_builder import get_or_create
spark = get_or_create()

everything will be configured. If this builder is not available (e.g. SWAN), then py4jgw might be configured directly:

from py4jgw import initialize_py4jgw
initialize_py4jgw(jvm) # On SWAN: spark._jvm

Initialization must be done before calling methods or creating objects of classes imported from py4jgw. Otherwise, the following RuntimeError occurs:

Traceback (most recent call last):
    File "py4j_stubs_demo/demo/throw_module_not_found_exception.py", line 3, in <module>
        varCond = Variables.suchThat().variableName().like("%TGM%").and_().description().exists()
    File "py4j_stubs_demo/venv/lib/python3.9/site-packages/py4j_utils/py4j_importer.py", line 237, in __getattr__
        self._lazy_init()
    File "py4j_stubs_demo/venv/lib/python3.9/site-packages/py4j_utils/py4j_importer.py", line 205, in _lazy_init
        raise RuntimeError(
    RuntimeError: PySpark and/or py4j gateway is not yet initialized, please create an NXCALS spark session 
    with `spark_session_builder.get_or_create(...)` before using classes imported from p4jgw.*

Usage

Code illustrating the correct structure:

Python

# spark_session_builder is available after doing `pip install nxcals`
from nxcals import spark_session_builder
# Import the Python counterparts of the NXCALS APIs (automatically generated with code completion):
from py4jgw.cern.nxcals.api.extraction.metadata import ServiceClientFactory
from py4jgw.cern.nxcals.api.extraction.metadata.queries import Variables

# initialize a session with NXCALS
spark = spark_session_builder.get_or_create(app_name='nxcals-py4j-demo')

# now you can use the NXCALS APIs imported above
variableService = ServiceClientFactory.createVariableService()
var = variableService.findOne(Variables.suchThat().variableName().eq('HX:BMODE')).get()
print(var.getVariableName())

An example of using the VariableService:

Python

from nxcals import spark_session_builder
from py4jgw.cern.nxcals.api.extraction.metadata import ServiceClientFactory
from py4jgw.cern.nxcals.api.extraction.metadata.queries import Variables

spark = spark_session_builder.get_or_create(app_name='nxcals-stubs-demo')

variableService = ServiceClientFactory.createVariableService()
var = variableService.findOne(Variables.suchThat().variableName().eq('HX:BMODE')).get()
print(var.getVariableName())

An example of using the FillService:

Python

from nxcals import spark_session_builder
from py4jgw.cern.nxcals.api.custom.service import Services

spark = spark_session_builder.get_or_create(app_name='nxcals-stubs-demo')

fillservice = Services.newInstance().fillService()
fill = fillservice.findFill(3000)
print(fill)

An example of using the method and_() with a trailing underscore:

Python

from nxcals import spark_session_builder
from py4jgw.cern.nxcals.api.extraction.metadata.queries import Variables

spark = spark_session_builder.get_or_create(app_name='nxcals-stubs-demo')

# please note the use of and_() with an underscore, like in jPype
var_cond = Variables.suchThat().variableName().like("%MTG%").and_().description().exists()

A more complete example:

Python

from nxcals import spark_session_builder
from py4jgw.cern.nxcals.api.custom.service import Services
from py4jgw.cern.nxcals.api.custom.service.extraction import ExtractionProperties, LookupStrategy
from py4jgw.cern.nxcals.api.extraction.metadata import ServiceClientFactory
from py4jgw.cern.nxcals.api.extraction.metadata.queries import Variables
from py4jgw.cern.nxcals.api.utils import TimeUtils

spark = spark_session_builder.get_or_create(app_name='nxcals-stubs-demo')

variableService = ServiceClientFactory.createVariableService()
extractionService = Services.newInstance().extractionService()

myVariable = variableService.findOne(Variables.suchThat().variableName().eq("CPS.TGM:CYCLE"))

if myVariable.isEmpty():
    raise ValueError("Could not obtain variable from service")

startTime = TimeUtils.getInstantFromString("2020-04-25 00:00:00.000000000")
endTime = TimeUtils.getInstantFromString("2020-04-26 00:00:00.000000000")

properties = ExtractionProperties.builder().timeWindow(startTime, endTime) \
    .lookupStrategy(LookupStrategy.LAST_BEFORE_START_IF_EMPTY).build()

dataset = extractionService.getData(myVariable.get(), properties)

print(dataset.count())
dataset.show()

Accessing Java APIs using JPype

JPype Python module is interfacing JVM at the native level. It allows Python to make use of Java specific libraries.

Installation steps

First JPype module has to be installed. This can be achieved by using the python pip module:

pip install JPype1

JPype requires jars so it can create a JVM and include them to the classpath. The imported libraries would be automatically exposed with a pythonic way to our process and the general look and feel of the library structure would be as it was natively developed with python.

Hint

One way of obtaining NXCALS API jars and their dependencies is through the installation of NXCALS package. All the neccesary jars can be found in the newly created venv directories:

    venv/nxcals-bundle/jars
    venv/nxcals-bundle/nxcals_jars

Start JVM and access services

Assuming that the required jars can be accessed from the previously installed NXCALS package (see the hint above), we can execute the following code in order to access data from metadata storage:

Python

import jpype

# Enable Java imports
import jpype.imports

# Import all standard Java types into the global scope
from jpype.types import *

# Launch the JVM, convertStrings=True is used for the proper conversion of java.lang.String to Python string literals
jpype.startJVM(classpath=['/your_nxcals_package_location/venv/nxcals-bundle/jars/*',
                          '/your_nxcals_package_location/venv/nxcals-bundle/nxcals_jars/*'], convertStrings=True)

# point metadata client to NXCALS PRO services
from java.lang import System
System.setProperty("service.url", "https://cs-ccr-nxcals5.cern.ch:19093,https://cs-ccr-nxcals5.cern.ch:19094,https://cs-ccr-nxcals6.cern.ch:19093,https://cs-ccr-nxcals6.cern.ch:19094,https://cs-ccr-nxcals7.cern.ch:19093,https://cs-ccr-nxcals7.cern.ch:19094,https://cs-ccr-nxcals8.cern.ch:19093,https://cs-ccr-nxcals8.cern.ch:19094");

from cern.nxcals.api.extraction.metadata import ServiceClientFactory
vs = ServiceClientFactory.createVariableService()

from cern.nxcals.api.extraction.metadata.queries import Variables
var = vs.findOne(Variables.suchThat().variableName().eq("HX:BMODE")).get()

var.getVariableName()
var.getId()

Load java packages under a package alias

Please take note

In some rare cases CERN root package on python namespace might be already used (or clashing packages) the classpath packages might not be directly loaded on the JPype process. Thus, while trying to import a class from java (ex. ServiceClientFactory from metadata-api), might lead to module not loaded from classpath exception:

>>> from cern.nxcals.api.extraction.metadata import ServiceClientFactory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'cern.nxcals.api.extraction.metadata'

The solution for that issue is to use the advanced Jpype import options and register the java package(s) under an alias that we're sure that is not clashing with already available modules on python path. Therefore, we can eventually load and use the above factory by performing the following steps:

Python

# actual packages: cern.nxcals
# expose as alias: nxcals_meta
jpype.imports.registerDomain('nxcals_meta', alias='cern.nxcals')

# now we can normally import and use exposed units under package alias
from nxcals_meta.api.extraction.metadata import ServiceClientFactory
vs = ServiceClientFactory.createVariableService()

from nxcals_meta.api.extraction.metadata.queries import Variables
var = vs.findOne(Variables.suchThat().variableName().eq("HX:BMODE")).get()

var.getVariableName()

RSQL queries

Important

Please note that in case of working with certain RSQL queries some conflicts may occur related to reserved Python keywords such as: and, or, in. Both JPype and Py4J address those issues in a different way. JPype introduces corresponding methods with an underscore as a suffix e.i.: and_(), or_(), in_(). At the same time Py4J relies on a special syntax based on usage of getattr() method.

Below one can find examples illustrating a difference between Py4J and JPype when it comes to special handling of RSQL queries with reserved keywords. Keep in mind that for simplicity JPype is used for all other code snippets provided in the documentation.

Selecting a variable in CMW system:

Py4JJPype

_metadata = spark._jvm.cern.nxcals.api.extraction.metadata
variableService = _metadata.ServiceClientFactory.createVariableService()

Variables = _metadata.queries.Variables

variable = variableService \
    .findOne(getattr(Variables.suchThat().systemName().eq("CMW"), 'and')().variableName().eq("SPS:NXCALS_FUNDAMENTAL")) \
    .get()

variable.getDescription()

variable = variableService \
    .findOne(Variables.suchThat().systemName().eq("CMW").and_().variableName().eq("SPS:NXCALS_FUNDAMENTAL")) \
    .get()
variable.getDescription()

Retrieving 2 variables (please note that unfortunately Py4J requires some conditions nesting whereas syntax in JPype is more straightforward):

Py4JJPype

_metadata = spark._jvm.cern.nxcals.api.extraction.metadata
variableService = _metadata.ServiceClientFactory.createVariableService()

Variables = _metadata.queries.Variables

variables = variableService \
    .findAll(getattr( \
        getattr(Variables.suchThat().variableName().eq("SPS:NXCALS_FUNDAMENTAL"), 'or')().variableName().eq("CPS:NXCALS_FUNDAMENTAL"), \
        'and')().systemName().eq("CMW"))

len(variables)

variables = variableService.findOne(Variables.suchThat() \
    .systemName().eq("CMW").and_(Variables.suchThat.variableName().eq("SPS:NXCALS_FUNDAMENTAL").or_().variableName().eq("CPS:NXCALS_FUNDAMENTAL")))

Conclusion

Both options: JPype an Py4J are quite powerful and allow bridging Java and Python worlds with a minimal effort.

JPype exposes the JVM objects with a more user-friendly approach (less boilerplate code) and has straightforward way to access static properties and classes.

Py4j may become interesting when there is a need to run the java code on a remote host and access objects from a Python client running elsewhere.

It can be run easily from SWAN notebooks.