cookbook 'apache_spark_ng', '= 2.0.0'
apache_spark_ng (3) Versions 2.0.0 Follow0
A cookbook to install and configure Apache Spark
cookbook 'apache_spark_ng', '= 2.0.0', :supermarket
knife supermarket install apache_spark_ng
knife supermarket download apache_spark_ng
apache_spark
This cookbook installs and configures Apache Spark.
- GitHub: https://github.com/clearstorydata-cookbooks/apache_spark
- Chef Supermarket: https://supermarket.chef.io/cookbooks/apache_spark
- Travis CI: https://travis-ci.org/clearstorydata-cookbooks/apache_spark
- Documentation: http://clearstorydata-cookbooks.github.io/apache_spark/chef/apache_spark.html
Overview
This cookbook installs and configures Apache Spark. Currently, only the standalone deployment mode
is supported. Future work:
- YARN and Mesos deployment modes
- Support installing from Cloudera and HDP Spark packages.
Compatibility
The following platforms are currently tested:
- Ubuntu 12.04
- CentOS 6.5
The following platforms are not tested but will probably work (tests coming soon):
- Fedora 21
- Ubuntu 14.04
Configuration
-
node['apache_spark']['install_mode']
:tarball
to install from a downloaded tarball, orpackage
to install from an OS-specific package. -
node['apache_spark']['download_url']
: the URL to download Apache Spark binary distribution tarball in thetarball
installation mode. -
node['apache_spark']['checksum']
: SHA256 checksum for the Apache Spark binary distribution tarball. -
node['apache_spark']['pkg_name']
: package name to install in thepackage
installation mode. -
node['apache_spark']['pkg_version']
: package version to install in thepackage
installation mode. -
node['apache_spark']['install_dir']
: target directory to install Spark to in thetarball
installation mode. In thepackage
mode, this must be set to the directory that the package installs Spark into. -
node['apache_spark']['install_base_dir']
: in thetarball
installation mode, this is where the tarball is actually extracted, and a symlink pointing to the subdirectory containing a specific Spark version is created atnode['apache_spark']['install_dir']
. -
node['apache_spark']['user']
: UNIX user to create for running Spark. -
node['apache_spark']['group']
: UNIX group to create for running Spark. -
node['apache_spark']['standalone']['master_host']
: Spark standalone-mode workers will connect to this host. -
node['apache_spark']['standalone']['master_bind_ip']
: the IP the master should bind to. This should be set in such a way that workers will be able to connect to the master. -
node['apache_spark']['standalone']['master_port']
: the port for the Spark standalone master to listen on. -
node['apache_spark']['standalone']['master_webui_port']
: Spark standalone master web UI port. -
node['apache_spark']['standalone']['worker_bind_ip']
: the IP address workers bind to. They bind to all network interfaces by default. -
node['apache_spark']['standalone']['worker_webui_port']
: the port for the Spark worker web UI to listen on. -
node['apache_spark']['standalone']['job_dir_days_retained']
:app-...
subdirectories ofnode['apache_spark']['standalone']['worker_work_dir']
older than this number of days will be deleted periodically on worker nodes to prevent unbounded accumulation. These directories contain Spark executor stdout/stderr logs. The directories will still be retained to honornode['apache_spark']['standalone']['job_dir_num_retained']
. -
node['apache_spark']['standalone']['job_dir_num_retained']
: the minimum number of Spark executor log directories (app-...
) to retain, regardless of creation time. -
node['apache_spark']['standalone']['worker_dir_cleanup_log']
: log file path for the Spark executor log directories cleanup script. -
node['apache_spark']['standalone']['worker_cores']
: the number of "cores" (threads) to allocate on each worker node. -
node['apache_spark']['standalone']['worker_work_dir']
: the directory to store Spark executor logs and Spark job jars. -
node['apache_spark']['standalone']['worker_memory_mb']
: the amount of memory in MB to allocate to each worker (i.e. the maximum total memory used by different applications' executors running on a worker node). -
node['apache_spark']['standalone']['default_executor_mem_mb']
: the default amount of memory to be allocated to a Spark application's executor on each node. -
node['apache_spark']['standalone']['log_dir']
: the log directory for Spark masters and workers. -
node['apache_spark']['standalone']['daemon_root_logger']
: thespark.root.logger
property is set to this. -
node['apache_spark']['standalone']['max_num_open_files']
: the maximum number of open files to set usingulimit
before launching a worker. -
node['apache_spark']['standalone']['java_debug_enabled']
: whether Java debugging options are to be enabled for Spark processes. Note: currently, this option is not working as intended. -
node['apache_spark']['standalone']['default_debug_port']
: default Java debug port to use. A free port is chosen if this port is unavailable. -
node['apache_spark']['standalone']['master_debug_port']
: default Java debug port to use for Spark masters. A free port is chosen if this port is unavailable. -
node['apache_spark']['standalone']['worker_debug_port']
: default Java debug port to use for Spark workers. A free port is chosen if this port is unavailable. -
node['apache_spark']['standalone']['executor_debug_port']
: default Java debug port to use for Spark standalone executors. A free port is chosen if this port is unavailable. -
node['apache_spark']['standalone']['common_extra_classpath_items']
: common classpath items to add to Spark application driver and executors (but not Spark master and worker processes). -
node['apache_spark']['standalone']['worker_dir']
: Set to a non-nil value to tell the spark worker to use an alternate directory for spark scratch space -
node['apache_spark']['standalone']['worker_opts']
: Set to a non-nil value to pass along any additional settings to the spark worker. E.G.:-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400
. Ideal for worker options only that you do not want in the default configuration file. -
node['apache_spark']['conf']['...']
: Spark configuration options that go into the default Spark configuration file. See https://spark.apache.org/docs/latest/configuration.html for details. -
node['apache_spark']['standalone']['local_dirs']
: a list of local directories to use on workers. This is where map output files are stored, so these directories should have enough space available. -
node['apache_spark']['standalone']['ha_recovery_mode']
: This setting can be used to run spark master in HA mode. More details in: http://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper -
node['apache_spark']['standalone']['ha_zookeeper_url']
: The ZooKeeper cluster url (e.g., 192.168.1.100:2181,192.168.1.101:2181). -
node['apache_spark']['standalone']['ha_zookeeper_dir']
: The directory in ZooKeeper to store recovery state -
node['apache_spark']['standalone']['master_url']
: For HA via zookeeper the spark url needs to point to more than one master hosts If this is not defined it is derived from master_host and master_port -
node['apache_spark']['standalone']['worker_jmx_enabled']
: true|false, to enable or disable JMX for worker nodes -
node['apache_spark']['standalone']['worker_jmx_port']
: JMX port number for worker -
node['apache_spark']['standalone']['worker_jmx_authenticate']
: Whether JMX requires authentication or not for worker -
node['apache_spark']['standalone']['worker_jmx_ssl']
: Whether JMX requires SSL or not for worker -
node['apache_spark']['standalone']['master_jmx_enabled']
: true|false, to enable or disable JMX for master nodes -
node['apache_spark']['standalone']['master_jmx_port']
: JMX port number for master -
node['apache_spark']['standalone']['master_jmx_authenticate']
: Whether JMX requires authentication or not for master -
node['apache_spark']['standalone']['master_jmx_ssl']
: Whether JMX requires SSL or not for master
Testing
ChefSpec
bundle install
bundle exec rspec
Test Kitchen
bundle install
bundle exec kitchen test
Contributing
If you would like to contribute this cookbook's development, please follow the steps below:
- Fork this repository on GitHub
- Make your changes
- Run tests
- Submit a pull request
License
Apache License 2.0
Dependent cookbooks
apt >= 0.0.0 |
java >= 0.0.0 |
logrotate >= 0.0.0 |
monit_wrapper >= 0.0.0 |
tar >= 0.0.0 |
Contingent cookbooks
There are no cookbooks that are contingent upon this one.
Collaborator Number Metric
2.0.0 failed this metric
Failure: Cookbook has 0 collaborators. A cookbook must have at least 2 collaborators to pass this metric.
Contributing File Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a CONTRIBUTING.md file
Foodcritic Metric
2.0.0 failed this metric
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-install.rb:15
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-install.rb:16
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-standalone-master.rb:15
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-standalone-worker.rb:15
FC069: Ensure standardized license defined in metadata: apache_spark_ng/metadata.rb:1
Run with Foodcritic Version 12.0.1 with tags metadata,correctness ~FC031 ~FC045 and failure tags any
License Metric
2.0.0 failed this metric
apache_spark_ng does not have a valid open source license.
Acceptable licenses include Apache-2.0, apachev2, Apache 2.0, MIT, mit, GPL-2.0, gplv2, GNU Public License 2.0, GPL-3.0, gplv3, GNU Public License 3.0.
No Binaries Metric
2.0.0 passed this metric
Testing File Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
2.0.0 failed this metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a CONTRIBUTING.md file
Foodcritic Metric
2.0.0 failed this metric
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-install.rb:15
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-install.rb:16
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-standalone-master.rb:15
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-standalone-worker.rb:15
FC069: Ensure standardized license defined in metadata: apache_spark_ng/metadata.rb:1
Run with Foodcritic Version 12.0.1 with tags metadata,correctness ~FC031 ~FC045 and failure tags any
License Metric
2.0.0 failed this metric
apache_spark_ng does not have a valid open source license.
Acceptable licenses include Apache-2.0, apachev2, Apache 2.0, MIT, mit, GPL-2.0, gplv2, GNU Public License 2.0, GPL-3.0, gplv3, GNU Public License 3.0.
No Binaries Metric
2.0.0 passed this metric
Testing File Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
2.0.0 failed this metric
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-install.rb:16
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-standalone-master.rb:15
FC007: Ensure recipe dependencies are reflected in cookbook metadata: apache_spark_ng/recipes/spark-standalone-worker.rb:15
FC069: Ensure standardized license defined in metadata: apache_spark_ng/metadata.rb:1
Run with Foodcritic Version 12.0.1 with tags metadata,correctness ~FC031 ~FC045 and failure tags any
2.0.0 failed this metric
apache_spark_ng does not have a valid open source license.
Acceptable licenses include Apache-2.0, apachev2, Apache 2.0, MIT, mit, GPL-2.0, gplv2, GNU Public License 2.0, GPL-3.0, gplv3, GNU Public License 3.0.
No Binaries Metric
2.0.0 passed this metric
Testing File Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
2.0.0 passed this metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
2.0.0 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
2.0.0 failed this metric