I wrote this article for Linux users but I am sure Mac OS users can benefit from it too. Why use PySpark in a Jupyter Notebook? While using Spark, most data engineers recommends to develop either in Scala (which is the “native” Spark language) or in Python through complete PySpark API. Python for Spark is obviously slower than Scala. Dec 15, 2016.
- Spark Jupyter Notebook
- Install Jupyter Notebook On Mac
- Spark Download Mac Integrate Jupiter Key
- Jupyter Spark Kernel
![Integrate Integrate](/uploads/1/2/6/6/126684920/930671792.png)
Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
Install Java Development Kit
Download and install it from oracle.com
Add following code to your e.g.
.bash_profile
Install Apache Spark
You can use Mac OS package manager Brew (http://brew.sh/)
Set up env variables
Spark Jupyter Notebook
Add following code to your e.g.
.bash_profile
You can check
SPARK_HOME
path using following brew commandAlso check
py4j
version and subpath, it mau differ from version to version.Ipython profile
Since profiles are not supported in
jupyter
and now you can see following deprecation warningIt seems that it is not possible to run various custom startup files as it was with
ipython
profiles. Thus, the easiest way will be to run pyspark
init script at the beginning of your notebook manually or follow alternative way.Run ipython
Initialize
pyspark
sc
variable should be availableAlternatively
Install Jupyter Notebook On Mac
You can also force
pyspark
shell command to run ipython web notebook instead of command line interactive interpreter. To do so you have to add following env variables:Spark Download Mac Integrate Jupiter Key
and then simply run
Jupyter Spark Kernel
which will open a web notebook with
sc
available automatically.