Spark session creation
There are different methods of Spark session creation in Java presented in this document:
Based on default Spark properties
When obtaining an instance of a ServiceBuilder without specifying any additional arguments we actually create a local (default) Spark session:
ServiceBuilder defaultServiceBuilder = ServiceBuilder.getInstance();
return defaultServiceBuilder;
The same can be achieved in a more explicit way by providing default Spark properties (with a required application name):
SparkProperties sparkProperties = SparkProperties.defaults("MY_APP");
ServiceBuilder defaultServiceBuilder = ServiceBuilder.getInstance(sparkProperties);
With supplied Spark properties
Spark properties can be freely customised as in the example below where application will be executed in YARN mode running on the cluster:
SparkProperties sparkProperties = new SparkProperties();
sparkProperties.setAppName("MY_APP");
sparkProperties.setMasterType("yarn");
Map<String, String> properties = new HashMap<>();
properties.put("spark.executor.memory","8G");
properties.put("spark.executor.cores","10");
// yarn
properties.put("spark.yarn.appMasterEnv.JAVA_HOME", "/var/nxcals/jdk1.11");
properties.put("spark.executorEnv.JAVA_HOME", "/var/nxcals/jdk1.11");
properties.put("spark.yarn.jars", "hdfs:////project/nxcals/lib/spark-3.2.1/*.jar,hdfs:////project/nxcals/nxcals_lib/nxcals_pro/*.jar");
properties.put("spark.yarn.am.extraLibraryPath", "/usr/lib/hadoop/lib/native");
properties.put("spark.executor.extraLibraryPath", "/usr/lib/hadoop/lib/native");
properties.put("spark.yarn.historyServer.address", "ithdp1001.cern.ch:18080");
properties.put("spark.yarn.access.hadoopFileSystems", "nxcals");
properties.put("spark.sql.caseSensitive", "true");
sparkProperties.setProperties(properties);
ServiceBuilder yarnServiceBuilder = ServiceBuilder.getInstance(sparkProperties);
Note
More information about used properties for YARN can be found in Spark Apache documentation pages
Using predefined resources
For user convenience NXCALS provides predefined resource sizing which can be used for the creation of Spark session on YARN:
SparkSessionFlavor initialProperties = SparkSessionFlavor.MEDIUM;
// Please replace "pro" with "testbed" when accessing NXCALS TESTBED
ServiceBuilder yarnServiceBuilderFromPredefinedProperties
= ServiceBuilder.getInstance(SparkProperties.remoteSessionProperties("MY_APP", initialProperties, "pro"));
Note
There are three different configurations available:
Configuration type | Spark executor cores | Spark executor instances | Spark executor memory |
---|---|---|---|
SMALL | 2 | 4 | 2g |
MEDIUM | 4 | 8 | 4g |
LARGE | 4 | 16 | 4g |
By reusing Spark session
Yet another way of initializing ServiceBuilder is through the reusing of an existing Spark session:
SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My_APP");
org.apache.spark.SparkContext sc = new org.apache.spark.SparkContext(sparkConf);
SparkSession session = new SparkSession(sc);
ServiceBuilder serviceBuilderFromSession = ServiceBuilder.getInstance(session);
Via NXCALS SparkUtils
For covenience one can use NXCALS SparkUtils methods as in the example below:
SparkConf sparkConf = SparkUtils.createSparkConf(SparkProperties.defaults("MY_APP"));
ServiceBuilder serviceBuilderFromSparkUtility =
ServiceBuilder.getInstance(SparkUtils.createSparkSession(sparkConf));