Position:Home > spark-submit

spark-submit

一旦用户的应用打包后,就可以使用bin/spark-submit脚本来启动,此脚本就会为Spark和它的依赖安排配置环境变量,还支持不同的集群管理和部署模式:

Recommend:python - pyspark: ship jar dependency with spark-submit

elasticsearch cluster; everything works (mostly) as expected when I run it locally, I downloaded the elasticsearch-hadoop jar file for the org.elasticsearch.hadoop.mr.EsOutputFormat and org.elasticsearch.hadoop.mr.Linke

./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
  一些常用的命令如下:
.--class:应用程序的入口(如:org.apache.spark.examples.SparkPi)
 
.--master:集群的master URL(如: spark://23.195.26.187:7077)
 
.--deploy-mode:选择集群模式来部署Driver进程或本地模式作为客户端
 
--conf:通过key=value格式来设置Spark配置文件属性,当value值包含格时,用引号引起key=value(”key=value“)
 
--application-jar:包含应用程序和应用程序所依赖包的路径,URL在集群中必须是全局可见的,例如:hdfs://或者file://路径必须在节点上也是存在的
 
--application-arguments:将参数传递在主要的main方法中

  

本地模式:
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100  
  Spark独立部署模式(Client):
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
 

  

Spark独立部署模式并监视(Cluster)
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster
--supervise
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000 
  Yarn部署模式:
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \ # can also be `yarn-client` for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
  Master URLs 通过如下方法将Master URL传递给Spark local 单个本地工作线程程运行Spark local[K] K个本地线程运行Spark(K为你机器的内核数) local[*] 根据你机器的内核数据来启动多个线程运行Spark spark://HOST:PORT 连接Spark集群的master,端口要和配置文件中的相同,默认为7077 mesos://HOST:PORT 连接Mesos集群,端口默认为5055 yarn-client 在client模式下连接YARN集群,集群的位置能在HADOOP_CONF_DIR 或YARN_CONF_DIR找到 yarn-cluster 在集群械下连接YARN集群,HADOOP_CONF_DIR 或YARN_CONF_DIR找到  

Recommend:Spark submit appication master host

tup a master node with two slaves with spark, a single node with zookeeper, and a single node with kafka. I wanted to launch a modified version of the kafka wordcount example using spark streaming in python. To submit an

Recommend:osx - spark submit java.lang.ClassNotFoundException

t this error: Users/_name_here/dev/sp/target/scala-2.10/sp_2.10-0.1-SNAPSHOT.jar --stacktracejava.lang.ClassNotFoundException: /Users/_name_here/dev/sp/mo/src/main/scala/MySimpleApp at java.lang.Class.forName

Your Answer
(Ctrl+Enter To Post)   
    Copyright © 2015-2017 163JAVA.COM All Rights Reserved.