Using SWAN
Using SWAN is the easiest and recommended way for simple scripting. Everything is configured and up to date. For more details check Using SWAN page.
Instalation from acc-py repository (Python 3.7+)
To be able to install package, you need Python3, in version at least 3.7 (currently supported version is 3.7.9). Acc-Py is available on machines from the CERN accelerator sector (ACC).
source /acc/local/share/python/acc-py/base/pro/setup.sh
acc-py venv ./venv
source ./venv/bin/activate
python -m pip install nxcals
python3 -m venv ./venv
source ./venv/bin/activate
python -m pip install -U --index-url https://acc-py-repo.cern.ch/repository/vr-py-releases/simple --trusted-host acc-py-repo.cern.ch nxcals
Important
In case of getting the following error message during installation of NXCALS package :
zipfile.BadZipFile: File is not a zip file
python -m pip install nxcals --no-cache
Running
You need a valid kerberos ticket. To init kerberos:
kinit
Activate virtual environment:
source ./venv/bin/activate
And later start python and create spark object:
from nxcals.spark_session_builder import get_or_create
from nxcals.api.extraction.data.builders import DataQuery
spark = get_or_create("My_APP")
df = DataQuery.builder(spark).entities().system('CMW') \
.keyValuesEq({'device': 'LHC.LUMISERVER', 'property': 'CrossingAngleIP1'}) \
.timeWindow('2022-04-22 00:00:00.000', '2022-04-23 00:00:00.000') \
.build()
You can find more examples in Extraction API chapter.
Using bundle
If you want to have everything packed, and you don't want to configure venv with acc-py, you can use our bundle. It contains preconfigured spark and needed python packages - everything what you need to start using NXCALS with Scala or Python. Instalation guide can be found here.
Instalation on LXPLUS
Instalation python package on LXPLUS is covered on dedicated page Using LXPLUS.
Jupyter
First install Jupyter into the venv with installed NXCALS.
python -m pip install jupyter
Important
Make sure to have jupyter in your PATH. Verify using "which jupyter".
Once done, export the pyspark python driver to be the jupyter and run the pyspark utility
from {venv}/nxcals-bundle/bin/pyspark
:
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
./venv/nxcals-bundle/bin/pyspark --master yarn --num-executors 10 # please update the path to venv if needed
This will open a browser with the Jupyter notebook. Create/open a notebook file and wait for the kernel and SparkSession to
start. After that, the already created SparkSession and SparkContext will be available under the spark
and sc
variables respectively.
Now your notebook is ready for the interaction with NXCALS API.
Known issues
Executing script in the YARN mode from a non "ACC" machine
In the case when machine does not have access to ACC-PY distribution and a script is submitted to the cluster using YARN mode, the similar error may occur:
Error:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (ithdp1058.cern.ch executor 3):
java.io.IOException: Cannot run program "./environment/bin/python": error=2, No such file or directory
Use NXCALS package with other BE libraries (JPype naming clash)
After successfully running the above pip install nxcals
commands, the process will install the following packages on your python setup:
(venv) [user@host]$ pip list | grep nxcals
nxcals x.y.z
nxcals-spark-session-builder x.y.z
nxcals-extraction-api-python3 x.y.z
nxcals-extraction-api-python3-legacy x.y.z
nxcals-extraction-api-python3-legacy
. This package contains the obsolete DataQuery builders under
cern
namespace and is loaded only for compatibility reasons. The legacy package will be eventually phased-out!
Legacy package is locking cern python namespace!
Unfortunately, it's the legacy package that locks the cern namespace and needs to be removed if your intention is to use NXCALS together with other cern libraries (especially the ones that expose java classes via Jpype)
Overcome issues with NXCALS and JPype
When using the newly nxcals package with another cern library that exposes Java classes directly as python modules (ex. PyLSA), one will quickly experience exceptions similar to the following example.
Try loading Java classes exposed as python modules via JPype on python context:
from cern.lsa.domain.settings import *
Will yield the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'cern.lsa'
In order to avoid this exception, we need to unlock the cern namespace in the python environment. Only then JPype would be able to expose the requested Java classes as python proxy modules.
To unlock cern namespace in the python env, the nxcals legacy extraction package needs to be removed! One can achieve that by the following steps:
1) remove the legacy package from python environemnt
pip uninstall nxcals-extraction-api-python3-legacy
cern
package from the nxcals native python DataQuery
imports
Before:
from cern.nxcals.api.extraction.data.builders import *
from nxcals.api.extraction.data.builders import *
Hint
Proceed with the above actions, only if you experience issues with module loading (ModuleNotFoundError). In most use-cases this issue is not visible and can be safely ignored
Info
In the scenarios when the application is deployed, it is not easy to perform "pip uninstall" step described above directly at the deployment location.
One posibility to overcome this issue is to manually edit 'deployment/app/requirments.txt' file created in your project folder while performing:
acc-py app lock ./
nxcals-extraction-api-python3-legacy==....
line.
Then the application can be deployed without that particular package removed from the requirements specification, using:
acc-py app deploy --deploy-base /tmp/your_deployment_location ./