Okay, let’s have a bit of a look at the Jupyter Notebook.
Why Jupyter Notebooks?
As mentioned previously, one of the big requirements of any data science project is that it be reproducible. That is, anyone else should be able to take your data and produce the same result you got. For that to happen, your logic, your process, your calculations and assumptions all need to be thoroughly documented.
In the old days that likely met keeping a detailed notebook saving all your code. The notebook might be pen and paper or a digital version produced on some editor/word processor. The code, well in the really old days, it may have been punch cards. In more recent times it would have been files (raw code or executables) produced by the project.
The idea of the digital scientific notebook was to put that all in one place (digitally speaking). Your notes (procedure, logic, assumptions, why, where, etc) and your code are all recorded in the notebook. The code, if the notebook is opened in a suitable context/environment, can be executed. More importantly anyone who can get the notebook and your data should be able to reproduce your results. They can even change some of the code or use different data to see what happens.
That’s pretty much why scientific, e.g. Jupyter, notebooks exist.
Basics
A notebook consists of a sequence of cells. These cells can contain code or text but not both (I am ignoring comments in a code cell). When adding a new cell it will default to a code cell.
A code cell allows us to write and execute code. Depending on your set-up and configuration, the code cells could be written in various languages. That is determined by the active kernel. We will be sticking to Python, which is the default kernel (IPython). You will get syntax highlighting and tab completion in your code cells. When the code in a cell is executed (Shift+Enter
, via icon or menu), the code is sent to the kernel for processing and any output will be shown in a new output cell below the code cell. If there is not output or the output is None
, e.g. a function definition, there will be no output cell generated.
A text cell allows us to document our process in a more human readable way. Text cells use markdown for their underlying format/language. Markdown provides a convenient way to format the text. Things like headings, lists, tables, etc. Even formulae using Latex style markup. When we execute a markdown cell, the Markdown code is converted to the final rich text version. No new cell is generated.
And, finally, there is a raw cell. They are never evaluated by the notebook. Essentially, they are just passed through unchanged to any output generated from the notebook. I have never used a raw cell for the simple work I have done so far.
The workflow is pretty much like any coding exercise, except we will be writing/executing/rewriting our code in a cell until we get the desired result. And, we will be surrounding those code cells with text cells describing, in detail, what we are doing, why we are doing it and how we are doing it.
One thing to note. We will in the development process be executing cells sequentially. And, a cell further down the sequence can use/depend on the output of a previous cell. But that means the previous cell will need to be executed before the current cell can be executed. When we save our work and close the server and notebook, none of the underlying cell outputs in the kernel space are saved. The displayed outputs are, with a caveat or two. So, they will need to be re-executed the next time we start our notebook.
Fundamental Keyboard Shortcuts
A few keyboard shortcuts you really can’t live without (there will be more later).
- Shift+Enter: execute the cell. On completion will jump to the cell below. Or, if currently in the last cell in the notebook, it will add a new cell (code cell by default) and move to it.
- Esc: command mode. This allows us to navigate around the notebook using keyboard shortcuts rather than entering the key strokes in a cell. In VSCode a selected cell in command mode will have a solid bar (blue in my case) to the left.
- Enter: edit mode. This allows us to edit the markdown in an already executed text cell. In VSCode a cell in edit mode with have a diagonally striped bar (green in my case) to the left.
Note, unselected cells will not have a bar showing to the left.
Jupyter Server vs VS Code
There are some differences between the experience of working on a notebook in a browser accessing a Jupyter server and VS Code. Maybe for you major differences. Let’s start by looking at the interface in a browser.
At the command line I activated my ds-3.9
workspace and ran jupyter notebook
.
(base) PS R:\learn\ds_intro> conda activate ds-3.9
(ds-3.9) PS R:\learn\ds_intro> jupyter notebook
[I 14:28:28.389 NotebookApp] Serving notebooks from local directory: R:\learn\ds_intro
[I 14:28:28.390 NotebookApp] Jupyter Notebook 6.3.0 is running at:
[I 14:28:28.390 NotebookApp] http://localhost:8888/?token=dea9c3491b7d2e0124206c1bb826cf4cfb5a83a1af76d10d
[I 14:28:28.390 NotebookApp] or http://127.0.0.1:8888/?token=dea9c3491b7d2e0124206c1bb826cf4cfb5a83a1af76d10d
[I 14:28:28.390 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:28:28.829 NotebookApp]
To access the notebook, open this file in a browser:
file:///C:/Users/bark/AppData/Roaming/jupyter/runtime/nbserver-26808-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=dea9c3491b7d2e0124206c1bb826cf4cfb5a83a1af76d10d
or http://127.0.0.1:8888/?token=dea9c3491b7d2e0124206c1bb826cf4cfb5a83a1af76d10d
[I 14:29:04.606 NotebookApp] 302 GET /tree (127.0.0.1) 2.000000ms
[I 14:29:37.078 NotebookApp] 302 GET /?token=dea9c3491b7d2e0124206c1bb826cf4cfb5a83a1af76d10d (127.0.0.1) 2.000000ms
[W 14:29:45.258 NotebookApp] Notebook Section3_Lecture16.ipynb is not trusted
[I 14:29:46.293 NotebookApp] Kernel started: 761d809d-f82a-44aa-a923-1035ddc19546, name: python3
[I 14:30:16.164 NotebookApp] Creating new notebook in
[I 14:30:19.416 NotebookApp] Kernel started: 523b011f-bc5e-4a42-93e9-fcb1d27eba2e, name: python3
[I 14:31:45.864 NotebookApp] Saving file at /Section3_Lecture16.ipynb
[I 14:32:19.408 NotebookApp] Saving file at /Untitled.ipynb
[I 14:34:19.396 NotebookApp] Saving file at /Untitled.ipynb
[I 14:35:00.646 NotebookApp] Starting buffering for 761d809d-f82a-44aa-a923-1035ddc19546:0c026e1b124749fd8d84b985712d2810
[I 14:42:19.598 NotebookApp] Saving file at /Untitled.ipynb
And in my browser, after creating a new, empty notebook, I get/see the following.
And after cranking up my VSCode workspace and creating a new, empty notebook (via Command Palette), I get/see the following.
So, in VS Code we get a more limited functionality. Only two cell types (at least obviously, maybe something hidden). No Menu bar. And fewer apparent choices on the Toolbar. But, for now I am going to see how things work out using VS Code to work on my notebooks. I can always switch.
Working With Notebook in VS Code
I really don’t feel like creating a video, so you will just have to try the following yourself to see if what you get is the same as I get or, at least, what I describe below.
If I hover my mouse over the only cell, or click in it a *garbage can * icon will appear on the right side of the cell. As this is the only cell in the notebook, clicking the garbage can will delete any contents but not the cell itself. Otherwise doing so would delete the cell and all its content.
I can convert the cell to a markdown cell in one of two ways:
- Click the M downarrow icon (M↓) in the cell’s header
- Click in the cell, press
Esc
to get to command mode and press M (or m)
Once it is converted, I can convert it back to a code cell:
- Click the { } icon (curly braces for code?) in the cell’s header
- If the cell is empty, double-click in the cell, press
Esc
to get to command mode and press Y (or y, and no idea where “y” came from).<Esc>+A
will insert a new, blank code cell above the current cell.<Esc>+B
will add a new cell below the current cell.
Do note that to activate a code cell takes a single click, but a markdown cell seems to need a double-click? And, that the cell header is only present for markdown cells when the cell is active. Also, if you have an empty markdown cell in your notebook, it sort of disappears if you leave the cell. At least until you hover over it.
Executing a cell is also a little different for the two types of cell.
Leaving a markdown cell will automatically convert it to the rich text version. Or you can press Shift+Enter
. The markdown cell will be converted and a new code cell inserted below.
The Shift+Enter
key combination works in code cells as well. If there is any output it will appear below the code cell just executed and a new code cell will be inserted below that. And, above any existing cells below the code cell that was executed. If the code was changed and output for it already exists, the output cell will be updated accordingly. You can also just click the green arrow in the code cell’s header.
There are also icons on the Toolbar that you can use to run cells. They just tend to execute more cells than the above methods.
Getting Help
Within the notebook, you will get autocompletion and if you mouseover a variable/function you will get some information regarding what it is or what parameters it takes. For a function, when the first time you type/enter the brackets, you will also see some help info. But, not when the cursor is moved back in at a later time.
But you can also type a ? after a function name, no brackets, and execute the cell to get a more detailed help display. This will even work for you own functions if you have included a docstring when defining the function. Or any of your modules for that matter.
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the doc special attribute of that object.
PEP 257 – Docstring Conventions
So, here’s an idea of things might look if I try that with len?
.
alist = [1, 2, 3, 4]
len?