First commit

master
J. Fernando Sánchez 6 years ago
commit 3f287c88b3

2
.gitignore vendored

@ -0,0 +1,2 @@
.*
ssl-custom

@ -0,0 +1,116 @@
# Your data science lab, in a box
This repository contains two example deployments of a multi-user isolated environments using Jupyterhub.
It is aimed towards small research or data science teams.
The first one authenticates users using GitHub OAuth.
The second one also contains a self-hosted GitLab instance, which can be used for authentication and every else (e.g. CI/CD and docker registry).
It also contains an Nginx service as a reverse proxy
Although these deployments have been tested on a single machine, it can be scaled to multiple nodes using swarm (see https://github.com/jupyterhub/dockerspawner/pull/216).
Note that this is not meant as a guide or complete tutorial.
If you want to learn more about Jupyter(hub)'s architecture and configuration options, check out:
* https://github.com/jupyterhub/jupyterhub-deploy-docker
* https://z2jh.jupyter.org
# What's Jupyter?
Most people associate the Jupyter project (formerly known as ipython server) to the notebooks.
But it is way more than that: it is FANTASTIC project and community!
It includes many actively developed open source projects that go way beyond the original idea of notebooks and kernels.
Moreover, most of these projects are cloud-oriented.
Just to name a few:
* Jupyterhub: http://jupyterhub.readthedocs.io/en/latest/
* Jupyterlab: https://jupyterlab.readthedocs.io/en/stable/
* nbgrader: https://nbgrader.readthedocs.io/en/stable/
* Binder: https://mybinder.org/
In this repository we set up jupyterhub, which extends jupyter by providing multi-user support, authentication and different isolation/deployment options.
# Requirements
* Docker
* Docker-compose
* Docker-machine (recommended)
# Setup
* Create a machine
* Add SSH key
* Configure a DNS wildcard for your domain (if you don't own a domain, check out http://nip.io/ or http://xip.io)
* For convenience, change the SSH port to something other than 22 (e.g. 2222):
```
vi /etc/sshd_config
systemctl restart sshd
```
* Install docker. The easiest way is to use docker-machine:
```
docker-machine create --driver generic --generic-ip-address=lab.todevnull.com --generic-ssh-key ~/.ssh/id_rsa --generic-ssh-port 2222 labinabox
```
* Set up your environment to start using the remote docker:
```
eval $(docker-machine env labinabox)
docker info
```
* The docker spawner does not fetch the single-user image automatically, so you will have to pull it manually:
```
docker pull jupyter/scipy-notebook:latest
```
* Create a folder for user homes (workspaces) and give the docker image write permissions:
```
docker-machine ssh labinabox 'mkdir /mnt/home'
docker-machine ssh labinabox 'chown -R 1000:100 -R /mnt/home'
```
# SSL
This demo assumes you have a valid certificate (`/etc/ssl/ssl-custom/cert.pem`) and a key (`/etc/ssl/ssl-custom/key.pem`) for your domain.
## Certbot
You're encouraged to use a valid certificate authority such as letsencrypt.
Using certbot is pretty straightforward.
It even comes bundled in a docker image, and a standalone server:
```
LE_VERSION=v0.14.0
DOMAIN=todevnull.com
docker run -ti --rm -p 80:80 -p 443:443 --name certbot \
-v '/data/letsencrypt/etc/letsencrypt/:/etc/letsencrypt' \
-v '/data/letsencrypt/var/lib/letsencrypt:/var/lib/letsencrypt' \
-v '/var/www/letsencrypt/:/webroot' \
certbot/certbot:$LE_VERSION certonly --standalone \
--expand --keep \
-d hub.$DOMAIN -d lab.$DOMAIN -d registry.$DOMAIN -d github.$DOMAIN -d chat.$DOMAIN -d github.$DOMAIN
```
Now, simply move the generated certificates to the paths the demos expect:
```
docker-machine ssh labinabox "cp -L /data/letsencrypt/etc/letsencrypt/live/hub.$DOMAIN/privkey.pem /etc/ssl/ssl-custom/key.pem"
docker-machine ssh labinabox "cp -L /data/letsencrypt/etc/letsencrypt/live/hub.$DOMAIN/fullchain.pem /etc/ssl/ssl-custom/cert.pem"
```
## Self-signed
For a simple test, you can also generate your own self-signed certificates using openssl:
```
export DOMAIN=<YOUR DOMAIN NAME>
openssl req -x509 -newkey rsa:4096 -keyout ssl-custom/key.pem -out ssl-custom/cert.pem -days 365 -subj "/C=ES/ST=Madrid/L=Madrid/O=Lab in a Box/OU=Org/CN=*.${DOMAIN}"
docker-machine scp -r ssl-custom labinabox:/etc/ssl/
```
# Notes
* Instead of creating a custom image, nginx should rely on the vanilla nginx docker image with configuration as a bind mount, but that requires syncing configuration files with the server.
* **Do not even consider deploying an environment like the one in this demo without a backup strategy**: http://www.taobackup.com/
* Folder permissions should be more restrictive. You can chown the files to the default uid and gid of the jupyter image.

@ -0,0 +1,19 @@
This deployment contains a single jupyterhub service, with the docker spawner and GitHub Oauth authentication.
Every user will get an isolated docker container after authenticating with GitHub.
The image for that container is configurable through the `DOCKER_IMAGE` environment variable.
# Instructions
Before running docker compose, you need to create a GitHub application: https://developer.github.com/apps/building-github-apps/creating-a-github-app/
Add the client ID and client secret to your `.env` file, or to your environment.
# Example .env file
```
DOMAIN=todevnull.com
GITHUB_CLIENT_ID=<CLIENT ID>
GITHUB_CLIENT_SECRET=<CLIENT_SECRET>
```

@ -0,0 +1,34 @@
version: '3.6'
services:
jupyter:
networks:
- labinabox
ports:
- '80:8000'
- '443:8000'
image: gsiupm/jupyterhub-oauth:0.8.1
command: jupyterhub -f /srv/jupyterhub/jupyterhub_config.py
hostname: jupyterhub
volumes:
- "/mnt/home:/mnt/home"
- "/var/run/docker.sock:/var/run/docker.sock"
- '/etc/ssl/ssl-custom/cert.pem:/srv/oauthenticator/ssl/ssl.cert'
- '/etc/ssl/ssl-custom/key.pem:/srv/oauthenticator/ssl/ssl.key'
environment:
OAUTH_CALLBACK_URL: "https://github.${DOMAIN-?todevnull.com}/hub/oauth_callback"
HOST_HOMEDIR: "/mnt/home/{username}"
OAUTH_CLASS: "oauthenticator.github.GitHubOAuthenticator"
GITHUB_CLIENT_ID: "${GITHUB_CLIENT_ID}"
GITHUB_CLIENT_SECRET: "${GITHUB_CLIENT_SECRET}"
JPY_COOKIE_SECRET: "${JPY_COOKIE_SECRET}"
JPY_API_TOKEN: "${CONFIGPROXY_AUTH_TOKEN}"
OAUTH_TLS_VERIFY: 0
COMMON_DIR: "/mnt/home/common"
DATASETS_DIR: "/mnt/home/datasets"
ADMINS: "${HUB_ADMINS-balkian,root}"
DOCKER_MEM_LIMIT: '250M'
DOCKER_NETWORK: labinabox
networks:
labinabox:
name: labinabox

@ -0,0 +1,51 @@
This deployment contains:
* Gitlab for code and authentication. The omnibus image is an all-in-one package that contains:
* Git (`git.domain.com`) and CI/CD
* Docker registry (`registry.domain.com`)
* Mattermost (slack clone, `chat.domain.com`)
* Postgres db
* Jupyterhub for multi-user computing (`hub.domain.com`)
* Authentication with Gitlab
* Every user has an isolated environment (thanks to Docker-spawner)
* Based on https://github.com/balkian/jupyterhub-oauth
* Nginx as a reverse proxy
# Instructions
Set the `DOMAIN` variable in a `.env` file and run this compose.
After GitLab is loaded, create an OAuth application in Gitlab: https://docs.gitlab.com/ce/integration/oauth_provider.html
The redirect URL is: `https://hub.<your domain>/hub/oauth_callback`.
Click on `API` level and `trusted`.
Write the client ID and client secret to a `.env` file in this folder.
Then, update the Jupyter service.
# Example .env file
```
DOMAIN=<YOUR DOMAIN>
GITLAB_CLIENT_ID=<YOUR ID>
GITLAB_CLIENT_SECRET=<YOUR SECRET>
```
# Advanced configuration
GitLab's documentation is terrific.
For a list of configuration options, see https://docs.gitlab.com/omnibus/docker/#install-gitlab-using-docker-compose .
# Note
When run as part of the omnibus, mattermost should register an application automatically on Gitlab.
I've had some issues with authentication, so if I've explicitly added the OAuth parameters in the compose file.
This way, you can manually register mattermost on your instance.
The process is similar to jupyter, and the callback URLs are:
```
https://chat.$DOMAIN/signup/gitlab/complete
https://chat.$DOMAIN/login/gitlab/complete
```

@ -0,0 +1,118 @@
version: '3.6'
services:
web:
build: nginx
networks:
- labinabox
image: 'custom-nginx'
depends_on:
- jupyter
- gitlab
ports:
- '80:80'
- '443:443'
environment:
REGISTRY_INTERNAL: http://gitlab:4567
LAB_INTERNAL: http://gitlab:80
HUB_INTERNAL: http://jupyter:8000
DOMAIN: ${DOMAIN-?todevnull.com}
volumes:
- '/etc/ssl/ssl-custom/cert.pem:/ssl/certs/nginx.crt'
- '/etc/ssl/ssl-custom/key.pem:/ssl/private/nginx.key'
- '/data/html:/usr/share/nginx/html'
gitlab:
networks:
- labinabox
image: 'gitlab/gitlab-ce:10.7.3-ce.0'
restart: always
hostname: 'lab.${DOMAIN-?todevnull.com}'
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'https://lab.${DOMAIN-?todevnull.com}'
mattermost_external_url 'https://chat.todevnull.com/'
registry_external_url 'https://registry.${DOMAIN-?todevnull.com}'
nginx['listen_port'] = 80
nginx['listen_https'] = false
# LFS
gitlab_rails['lfs_enabled'] = true
gitlab_rails['lfs_storage_path'] = "/mnt/lfs"
# Registry
gitlab_rails['registry_enabled'] = true
gitlab_rails['registry_port'] = "443"
registry_nginx['redirect_http_to_https'] = false
registry_nginx['listen_port'] = 4567
registry_nginx['nginx_enable'] = true
registry_nginx['listen_https'] = false
registry_nginx['https'] = false
gitlab_rails['registry_api_url'] = "http://127.0.0.1:5000"
registry['registry_http_addr'] = "127.0.0.1:5000"
registry_nginx['proxy_set_headers'] = {
"Host" => "registry.${DOMAIN}",
"X-Forwarded-Proto" => "https",
"X-Forwarded-Ssl" => "on"
}
mattermost['service_use_ssl'] = false
mattermost_nginx['listen_port'] = 80
mattermost_nginx['listen_https'] = false
mattermost['gitlab_enable'] = true
mattermost['gitlab_id'] = "${MATTERMOST_ID}"
mattermost['gitlab_secret'] = "${MATTERMOST_SECRET}"
mattermost['gitlab_scope'] = ""
mattermost['gitlab_auth_endpoint'] = "https://lab.${DOMAIN-?todevnull.com}/oauth/authorize"
mattermost['gitlab_token_endpoint'] = "https://lab.${DOMAIN-?todevnull.com}/oauth/token"
mattermost['gitlab_user_api_endpoint'] = "https://lab.${DOMAIN-?todevnull.com}/api/v4/user"
# Chat
mattermost_nginx['redirect_http_to_https'] = false
ports:
- '8080:80'
- '8443:443'
- '22:22'
- '4567:4567'
volumes:
- 'gitlab-config:/etc/gitlab'
- 'gitlab-logs:/var/log/gitlab'
- 'gitlab-data:/var/opt/gitlab'
- 'gitlab-lfs:/mnt/lfs'
jupyter:
networks:
- labinabox
image: gsiupm/jupyterhub-oauth:0.8.1
command: jupyterhub --no-ssl -f /srv/jupyterhub/jupyterhub_config.py
hostname: jupyterhub
volumes:
- "/mnt/home:/mnt/home"
- "/var/run/docker.sock:/var/run/docker.sock"
environment:
OAUTH_CALLBACK_URL: "https://hub.${DOMAIN-?todevnull.com}/hub/oauth_callback"
HOST_HOMEDIR: "/mnt/home/{username}"
OAUTH_CLASS: "oauthenticator.gitlab.GitLabOAuthenticator"
GITLAB_HOST: "https://lab.${DOMAIN-?todevnull.com}/"
GITLAB_CLIENT_ID: "${GITLAB_CLIENT_ID}"
GITLAB_CLIENT_SECRET: "${GITLAB_CLIENT_SECRET}"
JPY_COOKIE_SECRET: "${JPY_COOKIE_SECRET}"
JPY_API_TOKEN: "${CONFIGPROXY_AUTH_TOKEN}"
OAUTH_TLS_VERIFY: 0
COMMON_DIR: "/mnt/home/common"
DATASETS_DIR: "/mnt/home/datasets"
ADMINS: "${HUB_ADMINS-balkian,root}"
DOCKER_NETWORK: labinabox
DOCKER_MEM_LIMIT: '150M'
volumes:
gitlab-config:
name: gitlab-config
gitlab-logs:
name: gitlab-logs
gitlab-data:
name: gitlab-data
gitlab-lfs:
name: gitlab-lfs
networks:
labinabox:
name: labinabox

@ -0,0 +1,9 @@
FROM nginx:1.13
ENV DOMAIN="todevnull.com" HUB_INTERNAL="http://jupyter:80" LAB_INTERNAL="gitlab:80" REGISTRY_INTERNAL="gitlab:5000"
COPY init.sh /
COPY snippets/ /etc/nginx/snippets/
COPY conf.d /etc/nginx/conf.d/
CMD /init.sh

@ -0,0 +1,54 @@
server {
listen 80;
access_log /var/log/nginx/hub.access.log;
# add Strict-Transport-Security to prevent man in the middle attacks
server_name hub.${DOMAIN};
root /var/www/html/ ;
include /etc/nginx/snippets/letsencrypt.conf;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
listen [::]:443 ssl;
server_name hub.${DOMAIN} ;
access_log /var/log/nginx/hub.access.log;
error_log /var/log/nginx/hub.error.log;
# add Strict-Transport-Security to prevent man in the middle attacks
#add_header Strict-Transport-Security "max-age=31536000";
client_max_body_size 100M;
ssl_certificate /ssl/certs/nginx.crt;
ssl_certificate_key /ssl/private/nginx.key;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass ${HUB_INTERNAL};
}
location ~* /user/([a-zA-Z0-9]+)/(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/? {
proxy_set_header Host $host;
# websocket support
proxy_http_version 1.1;
proxy_set_header Upgrade "websocket";
proxy_set_header Connection "Upgrade";
proxy_pass ${HUB_INTERNAL};
}
}

@ -0,0 +1,79 @@
server {
listen 80;
server_name lab.${DOMAIN} chat.${DOMAIN};
root /var/www/html/ ;
include /etc/nginx/snippets/letsencrypt.conf;
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl;
listen [::]:443 ssl;
server_name lab.${DOMAIN} chat.${DOMAIN} ;
access_log /var/log/nginx/gitlab.access.log;
error_log /var/log/nginx/gitlab.error.log;
# add Strict-Transport-Security to prevent man in the middle attacks
#add_header Strict-Transport-Security "max-age=31536000";
ssl_certificate /ssl/certs/nginx.crt;
ssl_certificate_key /ssl/private/nginx.key;
root /var/www/html/ ;
location / {
client_max_body_size 0;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass ${LAB_INTERNAL} ;
}
}
server {
listen 443 ssl;
listen [::]:443 ssl;
server_name registry.${DOMAIN} ;
access_log /var/log/nginx/gitlab-registry.access.log;
error_log /var/log/nginx/gitlab-registry.error.log;
# add Strict-Transport-Security to prevent man in the middle attacks
#add_header Strict-Transport-Security "max-age=31536000";
# disable any limits to avoid HTTP 413 for large image uploads
client_max_body_size 0;
# required to avoid HTTP 411: see Issue #1486 (https://github.com/docker/docker/issues/1486)
chunked_transfer_encoding on;
ssl_protocols TLSv1.1 TLSv1.2;
ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
include /etc/nginx/snippets/letsencrypt.conf;
root /var/www/html/ ;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass ${REGISTRY_INTERNAL};
proxy_read_timeout 900;
}
}

@ -0,0 +1,9 @@
#!/bin/sh
cd /etc/nginx/conf.d/
VARIABLES='${HUB_INTERNAL}${DOMAIN}${LAB_INTERNAL}${REGISTRY_INTERNAL}'
for i in *.template; do
target=$(basename $i ".template")
envsubst $VARIABLES < $i > $target;
done || exit 1
nginx -g 'daemon off;'

@ -0,0 +1,44 @@
#############################################################################
# Configuration file for Let's Encrypt ACME Challenge location
# This file is already included in listen_xxx.conf files.
# Do NOT include it separately!
#############################################################################
#
# This config enables to access /.well-known/acme-challenge/xxxxxxxxxxx
# on all our sites (HTTP), including all subdomains.
# This is required by ACME Challenge (webroot authentication).
# You can check that this location is working by placing ping.txt here:
# /var/www/letsencrypt/.well-known/acme-challenge/ping.txt
# And pointing your browser to:
# http://xxx.domain.tld/.well-known/acme-challenge/ping.txt
#
# Sources:
# https://community.letsencrypt.org/t/howto-easy-cert-generation-and-renewal-with-nginx/3491
#
#############################################################################
# Rule for legitimate ACME Challenge requests (like /.well-known/acme-challenge/xxxxxxxxx)
# We use ^~ here, so that we don't check other regexes (for speed-up). We actually MUST cancel
# other regex checks, because in our other config files have regex rule that denies access to files with dotted names.
location ^~ /.well-known/acme-challenge/ {
# Set correct content type. According to this:
# https://community.letsencrypt.org/t/using-the-webroot-domain-verification-method/1445/29
# Current specification requires "text/plain" or no content header at all.
# It seems that "text/plain" is a safe option.
default_type "text/plain";
# This directory must be the same as in /etc/letsencrypt/cli.ini
# as "webroot-path" parameter. Also don't forget to set "authenticator" parameter
# there to "webroot".
# Do NOT use alias, use root! Target directory is located here:
# /var/www/common/letsencrypt/.well-known/acme-challenge/
root /var/www/letsencrypt;
}
# Hide /acme-challenge subdirectory and return 404 on all requests.
# It is somewhat more secure than letting Nginx return 403.
# Ending slash is important!
location = /.well-known/acme-challenge/ {
return 404;
}
Loading…
Cancel
Save