Jupyter Notebook is a Must-Learn Technology for Modern Python(Python Core in Action 2)
10 Essential Tips for Mastering Jupyter Notebook in Python
Python is now the top-ranked programming language.
If a Python engineer still doesn't know how to use Jupyter Notebook, they are likely very behind the times.
Sharpening the knife helps with chopping wood. Efficient tools make our programming work twice as easy. In this lesson, I will teach you Jupyter Notebook to lay the essential foundation for your future Python learning.
What is the Jupyter Notebook?
After saying so much, what exactly is Jupyter Notebook?
According to Jupiter's founder Fernando Pérez, his initial dream was to create an integrated computing platform for three scientific computing languages: Ju (Julia), Py (Python), and R. So he named it Ju-Py-te-R.
Today, Jupyter has become a multi-functional scientific computing platform that supports almost all languages. It can combine software code, computation output, explanatory documents, and multimedia resources together.
Look at the image below, and you will understand what the Jupyter Notebook is.
You directly enter code into a box, run it, and it immediately outputs the result below.
How's that? Isn't it cool?
You may wonder, can such a seemingly "all show but no substance" thing really become a disruptor in the Python community?
To be honest, a few years ago I wouldn't have believed it either. So how influential is Jupyter Notebook?
The Influence of Jupyter Notebook
When we evaluate the influence of technology or use our technology to influence the world, we must consider its impact on education.
Take Microsoft Word, for example.
From a pure technology perspective, Word's single-machine design concept was 20 years behind the times.
However online document systems represented by Google Docs have not realized the anticipated impact on Word as imagined.
The intuitive reason is user habit. If users are used to modifying documents in Word, they will just send documents back and forth dozens of times. Using it is still manageable.
But more profoundly, the reason we have formed such user habits is rooted in our education system.
From an early age, the education system trains users in the habit of using Word for over 10-plus years from elementary to high school and university.
At work, experienced employees will also guide new employees to continue using Word, forming a positive feedback loop that sustains the technology's influence.
Going back to our topic today, let's look at Jupyter Notebook. Since 2017, many top computer science courses in North America have started using Jupyter Notebook as the primary tool.
For example, Fei-Fei Li's CS231N "Computer Vision and Neural Networks" course still used command-line Python for assignments in 2016 but switched all assignments to Jupyter Notebook in 2017.
Similarly, UC Berkeley's "Foundations of Data Science" course has had all assignments completed in Jupyter Notebook since 2017.
Jupyter Notebook's influence in industry is even greater.
At Facebook, although large-scale backend development still relies on full-featured IDEs, almost all medium and small programs, such as internal offline analysis software and machine learning module training, are done using Jupyter Notebook.
From what I understand, other major Silicon Valley tech companies, such as Google's AI research department Google Brain, exclusively use Jupyter Notebook as well, although they use their own customized version called Google Colab.
After seeing this, you should recognize Jupyter Notebook's status in the field.
However, when it comes to choosing technologies, some people say we should use a popular technology; others believe that since a major company like Alibaba is using it, it must be the future, so we should use it too. It must be said that these are one-sided perspectives.
I often encourage my technical peers to think independently about technology choices, rather than blindly following others.
At the very least, you should think about why Meta chose this technology. What problems does it solve?
Why didn't Facebook choose other technologies? What are its limitations?
Speaking solely about the choice itself, Meta likely chose this technology because it has hundreds of product lines and thousands of engineers.
But the same technology could become a burden for a team of just ten people.
Here, I don't want to mislead you about any technology. What I want to teach you is the dialectical way of analyzing technologies.
Next, let's look at what specific problems Jupyter has solved that others haven't.
The Benefits of Jupyter Notebook
Integration of All Resources
In real software development, context switching consumes a lot of time.
What does this mean? An example will make it easier to understand. For instance, you may need to switch windows to look at some documentation, then switch windows again to use another tool to create charts, and so on.
These are all factors that can impact productivity.
As I mentioned earlier, Jupyter solves this problem by putting all resources related to software writing in one place.
When you open a Jupyter Notebook, you can see the relevant documentation, charts, videos, and the corresponding code.
This way, you don't need to switch windows to find information. By looking at one file, you can get all the information about the project.
Interactive Programming Experience
In the fields of machine learning and mathematical statistics, Python programming is highly experimental. A common scenario is that a small piece of code needs to be rewritten 100 times, for example, to try 100 different methods, while keeping the rest of the code unchanged.
This is quite different from traditional Python development.
In the traditional Python development process, every experiment requires running all the code again, which can be very time-consuming for developers.
Especially in code bases with tens of millions of lines, like at Meta, even with the company's highly optimized underlying architecture, it can take a few minutes to run everything again.
However, Jupyter Notebook introduces the concept of Cells, allowing you to run only the code in a small Cell for each experiment.
It is WYSIWYG (what you see is what you get) - you can immediately see the results below the code.
With such strong interactivity, Python researchers can focus on the problem itself, without being burdened by complicated toolchains. There is no need to switch between the command line - all research work can be done in Jupyter.
Zero-cost Reproducibility
Similarly, in the fields of machine learning and mathematical statistics, Python is used in a very fast-paced way.
A common scenario is that you see someone's method with good results in a paper, but when you try to reproduce it, you find that you need to re-install a bunch of dependency software using pip.
These preparatory tasks may consume 80% of your time, but they do not contribute to actual productivity.
Initially, Jupyter Notebook was quite cumbersome as well, requiring you to install the IPython engine and its various dependencies on your local machine first.
However, the current technology trend is towards complete cloud-based solutions, such as the official Jupyter Binder platform (documentation: https://mybinder.readthedocs.io/en/latest/index.html) and the Google Colab environment provided by Google (introduction: https://colab.research.google.com/notebooks/welcome.ipynb).
They have made Jupyter Notebook as easy to use as online documents like Notion and Google Docs - you can run it just by opening a link in your browser.
So now, when you open a Jupyter Notebook from GitHub using Binder, you don't need to install any software. You can just open the code in your browser and run it in the cloud.
First Experience with Jupyter Notebook
The best way to learn a technology is to use it.
However, within today's scope, I cannot teach you all the tricks of Jupyter Notebook. I want to give you a direct feel for the working experience of using Jupyter Notebook first.
For example, with this GitHub file, in Binder, you only need to enter the name or URL of the corresponding GitHub Repository, and you can open the entire Repository in the cloud and select the notebook you need. You will then see the interface shown in the following image.
Each Jupyter runtime unit contains In and Out Cells.
As shown in the image, you can use the Run button to run a single Cell.
Of course, you can also modify it based on this, or create a new notebook and write your own program. Go ahead and try it out by opening the link!
Additionally, I recommend the following Jupyter Notebooks as your first stop for practice:
The first one is from Jupyter official: https://mybinder.org/v2/gh/binder-examples/matplotlib-versions/mpl-v2.0/?filepath=matplotlib_versions_demo.ipynb
The second one is the Google Research Colab environment, particularly suitable for machine learning practice applications: https://colab.research.google.com/notebooks/basic_features_overview.ipynb
If you want to install Jupyter Notebook on your local or remote machine, you can refer to the following two documents:
Installation: https://jupyter.org/install.html
Running: https://jupyter.readthedocs.io/en/latest/running.html#running
Conclusion
In this lesson, I introduced Jupyter Notebook to you and explained why it is becoming a must-learn technology in the Python community.
This is mainly due to its three major features: integration of all resources, interactive programming experience, and zero-cost reproducibility.
But as I said before, learning technology requires hands-on practice.
After this lesson, I hope you can try Jupyter Notebook yourself. In some of our future lessons, I will also share the code in the form of Jupyter Notebooks.