How does Spark Processing Start ?
- What happens when a jar is submitted through a Spark Client ?
- How does a spark job gets triggered ?
Let’s deep dive !!
-
A client program (spark-submit) submits the application, including the necessary specifications to run the application-specific ApplicationMaster (for Spark Applications - SparkMaster).
-
ResourceManager gets responsibility for the allocation of a necessary container in which ApplicationMaster(SparkMaster) will be started. Then ResourceManager starts the ApplicationMaster(SparkMaster).
-
SparkMaster is created at the same time as the Driver on the same node(in case of cluster mode) when the user submits the spark application using spark-submit. The Driver informs the Application Master about the executor’s requirements for the application and the Application Master negotiates the resources with the Resource Manager to host these executors.
-
ApplicationMaster registers itself in ResourceManager. Registration allows the Customer program (spark-submit) to request specific information from ResourceManager that allows it to directly interact with its ApplicationMaster.
-
ApplicationMaster asks for suitable containers from ResourceManager for the application to run. After successfully receiving the containers, ApplicationMaster launches them, providing NodeManager(s) their configurations.
-
Inside the containers, it runs the user application code. The NodeManager(s) then provides the information (execution phase, status) for ApplicationMaster.
-
During the runtime of the user application, the client interacts with ApplicationMaster to obtain the application status.
-
When the application completes and all necessary work is completed, ApplicationMaster deregisters from ResourceManager and terminates, releasing the container for other purposes.
-
Once the driver is up it runs the main program of the user application code. The first thing it does is create a SparkContext. So let us understand that in detail.