Using SWAN

Using SWAN is the easiest and recommended way for simple scripting. Everything is configured and up to date. For more details check Using SWAN page.

Instalation from acc-py repository (Python 3.7+)

To be able to install package, you need Python3, in version at least 3.7 (currently supported version is 3.7.9). Acc-Py is available on machines from the CERN accelerator sector (ACC).

Acc-Py

source /acc/local/share/python/acc-py/base/pro/setup.sh
acc-py venv ./venv
source ./venv/bin/activate
python -m pip install nxcals

Pure Python

python3 -m venv ./venv
source ./venv/bin/activate
python -m pip install -U --index-url https://acc-py-repo.cern.ch/repository/vr-py-releases/simple --trusted-host acc-py-repo.cern.ch nxcals

Important

In case of getting the following error message during installation of NXCALS package :

zipfile.BadZipFile: File is not a zip file

please use "no cache" option as follows:

python -m pip install nxcals --no-cache

Running

You need a valid kerberos ticket. To init kerberos:

kinit

Activate virtual environment:

source ./venv/bin/activate

And later start python and create spark object:

from nxcals.spark_session_builder import get_or_create
from nxcals.api.extraction.data.builders import DataQuery
spark = get_or_create("My_APP")

df = DataQuery.builder(spark).entities().system('CMW') \
    .keyValuesEq({'device': 'LHC.LUMISERVER', 'property': 'CrossingAngleIP1'}) \
    .timeWindow('2022-04-22 00:00:00.000', '2022-04-23 00:00:00.000') \
    .build()

You can find more examples in Extraction API chapter.

Using bundle

If you want to have everything packed, and you don't want to configure venv with acc-py, you can use our bundle. It contains preconfigured spark and needed python packages - everything what you need to start using NXCALS with Scala or Python. Instalation guide can be found here.

Instalation on LXPLUS

Instalation python package on LXPLUS is covered on dedicated page Using LXPLUS.

Jupyter

First install Jupyter into the venv with installed NXCALS.

python -m pip install jupyter

Important

Make sure to have jupyter in your PATH. Verify using "which jupyter".

Once done, export the pyspark python driver to be the jupyter and run the pyspark utility from {venv}/nxcals-bundle/bin/pyspark :

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

./venv/nxcals-bundle/bin/pyspark --master yarn --num-executors 10 # please update the path to venv if needed

This will open a browser with the Jupyter notebook. Create/open a notebook file and wait for the kernel and SparkSession to start. After that, the already created SparkSession and SparkContext will be available under the spark and sc variables respectively.

Now your notebook is ready for the interaction with NXCALS API.

Known issues

Executing script in the YARN mode from a non "ACC" machine

In the case when machine does not have access to ACC-PY distribution and a script is submitted to the cluster using YARN mode, the similar error may occur:

Error:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (ithdp1058.cern.ch executor 3):
java.io.IOException: Cannot run program "./environment/bin/python": error=2, No such file or directory

Unfortunately, the only possible option in such a case is to execute it in local mode.

Use NXCALS package with other BE libraries (JPype naming clash)

After successfully running the above pip install nxcals commands, the process will install the following packages on your python setup:

(venv) [user@host]$ pip list | grep nxcals
nxcals                        x.y.z
nxcals-spark-session-builder  x.y.z
nxcals-extraction-api-python3 x.y.z
nxcals-extraction-api-python3-legacy x.y.z

Note the last package, nxcals-extraction-api-python3-legacy. This package contains the obsolete DataQuery builders under cern namespace and is loaded only for compatibility reasons. The legacy package will be eventually phased-out!

Legacy package is locking cern python namespace!

Unfortunately, it's the legacy package that locks the cern namespace and needs to be removed if your intention is to use NXCALS together with other cern libraries (especially the ones that expose java classes via Jpype)

Overcome issues with NXCALS and JPype

When using the newly nxcals package with another cern library that exposes Java classes directly as python modules (ex. PyLSA), one will quickly experience exceptions similar to the following example.

Try loading Java classes exposed as python modules via JPype on python context:

from cern.lsa.domain.settings import *

Will yield the following exception:

Traceback (most recent call last):
    File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cern.lsa'

In order to avoid this exception, we need to unlock the cern namespace in the python environment. Only then JPype would be able to expose the requested Java classes as python proxy modules.

To unlock cern namespace in the python env, the nxcals legacy extraction package needs to be removed! One can achieve that by the following steps:

1) remove the legacy package from python environemnt

pip uninstall nxcals-extraction-api-python3-legacy

2) remove the cern package from the nxcals native python DataQuery imports

Before:

from cern.nxcals.api.extraction.data.builders import *

Now:

from nxcals.api.extraction.data.builders import *

Hint

Proceed with the above actions, only if you experience issues with module loading (ModuleNotFoundError). In most use-cases this issue is not visible and can be safely ignored

Info

In the scenarios when the application is deployed, it is not easy to perform "pip uninstall" step described above directly at the deployment location.

One posibility to overcome this issue is to manually edit 'deployment/app/requirments.txt' file created in your project folder while performing:

acc-py app lock ./

and remove the nxcals-extraction-api-python3-legacy==.... line. Then the application can be deployed without that particular package removed from the requirements specification, using:

acc-py app deploy --deploy-base /tmp/your_deployment_location ./