Authentication methods
Access to NXCALS system requires either Kerberos or RBAC authentication.
Please note the following rules:
- For Extraction API in order to use RBAC please set the environment variable NXCALS_RBAC_AUTH=true. This will enable Spark Service that uses RBAC to authenticate against our services in order to generate Hadoop delegation tokens allowing access.
- For Meta-data API existing RBAC token will take precedence over Kerberos.
- If RBAC is not present, Kerberos login will be attempted.
- To disable any of those mechanisms, you can use one of the following system properties: rbac.auth.disable=true or kerberos.auth.disable=true (more details).
- Please note that Spark currently does not support multi-user authentication from the same session per call. In other words setting a different user per Spark call using existing session has no effect.
- For Extraction API with RBAC authentication and
local
mode (spark.masterType: local
), event logging should be disabled* (for example, by addingspark.eventLog.enabled: false
to the spark properties)
* - This restriction comes as a result of Spark's limited support for delegation tokens on which RBAC authentication depends. In the local
mode, enabled Event logging requires access to HDFS before those delegation tokens can be obtained and consequently causes failure during SparkSession creation.
RBAC Authentication
Important
RBAC currently works for JAVA APIs only. Work in progress for Python.
In order to use RBAC one has to obtain a valid token. This can be done (for instance) using by location feature or explicit login. For all methods of RBAC login please refer to the RBAC documentation on the wikis. In this snippet you can see how to use explicit RBAC login:
String user = System.getProperty("user.name");
String password = System.getProperty("user.password");
// Enable NXCALS RBAC authentication in Hadoop delegation tokens generation, in Spark (for extraction)
System.setProperty("NXCALS_RBAC_AUTH", "true");
try {
AuthenticationClient authenticationClient = AuthenticationClient.create();
RbaToken token = authenticationClient.loginExplicit(user, password);
ClientTierTokenHolder.setRbaToken(token);
} catch (AuthenticationException e) {
throw new IllegalArgumentException("Cannot login", e);
}
RBAC token has to be set before the Spark Session is created.
Kerberos Authentication
NXCALS provides seamless Kerberos integration. The system will either use the existing Kerberos token (in the environment) or will try to authenticate the client using provided Kerberos principal & keytab (encrypted file containing user password). Instructions how to install Kerberos or generate Keytab you can find in section Kerberos pointers.
Authenticating with existing Kerberos token (environment shared state)
In essence, all you need to do prior to running your code is to create your Kerberos token:
kinit
We will pick up the Kerberos token and authenticate you automatically. The advantage of this method is the simplicity. In addition, you can setup a cron job to renew the token indefinitely.
Hint
A cron job entry that will re-initialize your krb ticket:
0 8,12,19 * * * kinit -f -r 5d -kt /path/to/your.keytab [username]
Be aware that the krb ticket has a validity of max 24 hours, thus the provided cron schedule expression should not be less frequent than that.
Authenticating with .keytab file
If you would like to authenticate using your .keytab file, you have to include the following statement in your code:
static {
System.setProperty("kerberos.principal", "nxcalsuser"); // replace with your username
System.setProperty("kerberos.keytab", "/opt/nxcalsuser/.keytab"); // replace with the .keytab file location
}
Once this is done, you will have programmatically obtained a Kerberos token. The advantage of this solution is that you rely on your code to obtain the token. It is safer and more reliable. Instruction how to obtain it, you can find in section Kerberos keytab file generation
Troubleshooting Authentication Issues
Kerberos and Hadoop are relatively hard to debug. Messages coming from JVM can be difficult for diagnosing. Extra debugging information can be enabled for the client.
Please add the following settings to your Spark config under
spark.driver.extraJavaOptions -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext
spark.yarn.appMasterEnv.HADOOP_JAAS_DEBUG true
spark.yarn.am.extraJavaOptions -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext
export HADOOP_JAAS_DEBUG=true
Disable RBAC or Kerberos authentication mechanisms
In case you need to control the default order of authentication mechanisms or disable any of those,
you could achieve it by setting one of the following system properties:
- rbac.auth.disable=true
(to disable RBAC authentication)
- kerberos.auth.disable=true
(to disable Kerberos authentication)
Please, note, that you cannot disable both RBAC and Kerberos at the same time. Therefore make sure that maximum one of the above properties is set to true. Otherwise, an exception will be thrown as there won't be any authentication mechanism enabled.
It is also possible to use Constants.RBAC_AUTH_DISABLE
and Constants.KERBEROS_AUTH_DISABLE
as in the example below:
static {
// only one of those two properties may be set to true
System.setProperty(Constants.RBAC_AUTH_DISABLE, "true"); // disable RBAC authentication
// System.setProperty(Constants.KERBEROS_AUTH_DISABLE, "true"); // disable Kerberos authentication
}
CERN Grid CA certificate Installation (optional)
Important
This action is considered optional since NXCALS client tries to obtain the CA certificates from the embedded trustStore file that is available on the resources.
If you still need to use your own or existing trustStore that contains the needed CERN certificates, then click on the following panel to see how you can achieve that.
Click to see how to manually install a CERN grid CA certificate...
Warning
Machines with JDK provided by BE-CSS do not require creation of the truststore. It is already preinstalled. The steps below concerning certificate installation can be simply skipped.
1. Download certificate
You can download the 'CERN grid CA certificate' directly to your system from CERN's certificates page (or from direct link).
Hint
Prefer to save the certificate with a name like: CERN_Grid_Certification_Authority.crt, as this name will be used as reference for the rest of this guide.
2. Generate SSL trustStore with JDK's keytool
For this step you need to have Java present on your system. In order to import the CERN grid CA certificate, you need to run the following command via java keytool:
/path/to/jdk/bin/keytool -import -alias cerngridcertificationauthority -file /path/to/CERN_Grid_Certification_Authority.crt -keystore nxcals_cacerts -storepass nxcals -noprompt
3. Reference certificate on NXCALS client applications
Once we have the CERN grid CA certificate imported to the JDK's keystore, we need to reference it on an NXCALS client application startup. In order to do that, we have to export the SSL trustStore and trustStore password as JVM related system properties.
For example, add the following:
System.setProperty("javax.net.ssl.trustStore", "nxcals_cacerts");
System.setProperty("javax.net.ssl.trustStorePassword", "nxcals");
Kerberos pointers
Kerberos Installation (optional, for non-CERN machines)
- Install Kerberos client
This will allow access to any Kerberos protected services once a user has successfully logged into the system.
Important
Most likely the Kerberos software installation is not required on standard CERN machines (running Linux, Windows or Mac OSX), and the steps below can be simply omitted.
On the contrary if the installation is necessary please follow instructions below (given for Ubuntu Linux):
Install krb5-user package which provides the basic kinit, klist, kdestroy, and kpasswd clients:
sudo apt-get krb5-user
- Configure local Kerberos client Configure Kerberos realm and write CERN.CH when requested to complete the realm after executing the following command:
sudo dpkg-reconfigure krb5-config
Kerberos keytab file generation
The preferred and the easiest method for obtaining keytab file is to generate it on lxplus machine by executing:
cern-get-keytab --user --keytab <keytab.file>
Once the .keytab file is created correctly we can check if obtaining Kerberos ticket works:
kdestroy && kinit -f -r 5d -kt /<path_to_keytab_file>/<user>.keytab <user>
klist
# Correct output
Ticket cache: FILE:/tmp/krb5cc_14420
Default principal: <user>@CERN.CH
Valid starting Expires Service principal
07/01/17 14:00:01 07/02/17 15:00:01 krbtgt/CERN.CH@CERN.CH
renew until 07/06/17 14:00:01
Important
Please note that this file should be well protected as it contains user's password
Kerberos cache
Kerberos cache may configured in many different ways. Detailed description can be found here.
From our experience, recommended to use with NXCALS is FILE
.
To check which cache do you use, you can check
klist
Desired output is:
$ klist
Ticket cache: FILE:/tmp/krb5cc_125508_UIeBTLzR7L
...
If you have other, like:
$ klist
Ticket cache: KEYRING:persistent:125508:krb_ccache_pfl5DMo
You can face problem, that Spark cannot access it. You will see error similar to below:
...
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:179)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839)
... 40 more
Then you can configure your Kerberos cache to use file in following way:
export KRB5CCNAME=$(mktemp)
chmod 600 $KRB5CCNAME
kinit -f # You need to login again
Kerberos realm
If you face error like:
Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm
at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317)
at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273)
at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:364)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:364)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:901)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: Cannot locate default realm
at javax.security.auth.kerberos.KerberosPrincipal.<init>(KerberosPrincipal.java:159)
at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
... 28 more
includedir
, like it is on LXPLUS8 and 9. To fix the issue, please use newer Java (11+).