commit 3f287c88b3da7a2865b6f3a28997b137f87b2581 Author: J. Fernando Sánchez Date: Sun May 20 20:14:00 2018 +0200 First commit diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..0cfad12 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +.* +ssl-custom \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..9c08b97 --- /dev/null +++ b/README.md @@ -0,0 +1,116 @@ +# Your data science lab, in a box + +This repository contains two example deployments of a multi-user isolated environments using Jupyterhub. +It is aimed towards small research or data science teams. + +The first one authenticates users using GitHub OAuth. + +The second one also contains a self-hosted GitLab instance, which can be used for authentication and every else (e.g. CI/CD and docker registry). +It also contains an Nginx service as a reverse proxy + +Although these deployments have been tested on a single machine, it can be scaled to multiple nodes using swarm (see https://github.com/jupyterhub/dockerspawner/pull/216). + +Note that this is not meant as a guide or complete tutorial. +If you want to learn more about Jupyter(hub)'s architecture and configuration options, check out: + +* https://github.com/jupyterhub/jupyterhub-deploy-docker +* https://z2jh.jupyter.org + +# What's Jupyter? + + +Most people associate the Jupyter project (formerly known as ipython server) to the notebooks. +But it is way more than that: it is FANTASTIC project and community! +It includes many actively developed open source projects that go way beyond the original idea of notebooks and kernels. +Moreover, most of these projects are cloud-oriented. +Just to name a few: + +* Jupyterhub: http://jupyterhub.readthedocs.io/en/latest/ +* Jupyterlab: https://jupyterlab.readthedocs.io/en/stable/ +* nbgrader: https://nbgrader.readthedocs.io/en/stable/ +* Binder: https://mybinder.org/ + +In this repository we set up jupyterhub, which extends jupyter by providing multi-user support, authentication and different isolation/deployment options. + +# Requirements + +* Docker +* Docker-compose +* Docker-machine (recommended) + +# Setup + +* Create a machine +* Add SSH key +* Configure a DNS wildcard for your domain (if you don't own a domain, check out http://nip.io/ or http://xip.io) +* For convenience, change the SSH port to something other than 22 (e.g. 2222): +``` +vi /etc/sshd_config +systemctl restart sshd +``` +* Install docker. The easiest way is to use docker-machine: + +``` +docker-machine create --driver generic --generic-ip-address=lab.todevnull.com --generic-ssh-key ~/.ssh/id_rsa --generic-ssh-port 2222 labinabox +``` +* Set up your environment to start using the remote docker: +``` +eval $(docker-machine env labinabox) +docker info +``` +* The docker spawner does not fetch the single-user image automatically, so you will have to pull it manually: +``` +docker pull jupyter/scipy-notebook:latest +``` +* Create a folder for user homes (workspaces) and give the docker image write permissions: +``` +docker-machine ssh labinabox 'mkdir /mnt/home' +docker-machine ssh labinabox 'chown -R 1000:100 -R /mnt/home' +``` + +# SSL + +This demo assumes you have a valid certificate (`/etc/ssl/ssl-custom/cert.pem`) and a key (`/etc/ssl/ssl-custom/key.pem`) for your domain. + + +## Certbot +You're encouraged to use a valid certificate authority such as letsencrypt. +Using certbot is pretty straightforward. +It even comes bundled in a docker image, and a standalone server: + +``` +LE_VERSION=v0.14.0 +DOMAIN=todevnull.com +docker run -ti --rm -p 80:80 -p 443:443 --name certbot \ + -v '/data/letsencrypt/etc/letsencrypt/:/etc/letsencrypt' \ + -v '/data/letsencrypt/var/lib/letsencrypt:/var/lib/letsencrypt' \ + -v '/var/www/letsencrypt/:/webroot' \ + certbot/certbot:$LE_VERSION certonly --standalone \ + --expand --keep \ + -d hub.$DOMAIN -d lab.$DOMAIN -d registry.$DOMAIN -d github.$DOMAIN -d chat.$DOMAIN -d github.$DOMAIN + +``` + +Now, simply move the generated certificates to the paths the demos expect: + +``` +docker-machine ssh labinabox "cp -L /data/letsencrypt/etc/letsencrypt/live/hub.$DOMAIN/privkey.pem /etc/ssl/ssl-custom/key.pem" +docker-machine ssh labinabox "cp -L /data/letsencrypt/etc/letsencrypt/live/hub.$DOMAIN/fullchain.pem /etc/ssl/ssl-custom/cert.pem" +``` + +## Self-signed + +For a simple test, you can also generate your own self-signed certificates using openssl: + +``` +export DOMAIN= +openssl req -x509 -newkey rsa:4096 -keyout ssl-custom/key.pem -out ssl-custom/cert.pem -days 365 -subj "/C=ES/ST=Madrid/L=Madrid/O=Lab in a Box/OU=Org/CN=*.${DOMAIN}" + +docker-machine scp -r ssl-custom labinabox:/etc/ssl/ +``` + +# Notes + +* Instead of creating a custom image, nginx should rely on the vanilla nginx docker image with configuration as a bind mount, but that requires syncing configuration files with the server. +* **Do not even consider deploying an environment like the one in this demo without a backup strategy**: http://www.taobackup.com/ +* Folder permissions should be more restrictive. You can chown the files to the default uid and gid of the jupyter image. diff --git a/github/README.md b/github/README.md new file mode 100644 index 0000000..b51eb8d --- /dev/null +++ b/github/README.md @@ -0,0 +1,19 @@ +This deployment contains a single jupyterhub service, with the docker spawner and GitHub Oauth authentication. + +Every user will get an isolated docker container after authenticating with GitHub. +The image for that container is configurable through the `DOCKER_IMAGE` environment variable. + + +# Instructions + +Before running docker compose, you need to create a GitHub application: https://developer.github.com/apps/building-github-apps/creating-a-github-app/ +Add the client ID and client secret to your `.env` file, or to your environment. + + +# Example .env file + +``` +DOMAIN=todevnull.com +GITHUB_CLIENT_ID= +GITHUB_CLIENT_SECRET= +``` diff --git a/github/docker-compose.yml b/github/docker-compose.yml new file mode 100644 index 0000000..4e3617c --- /dev/null +++ b/github/docker-compose.yml @@ -0,0 +1,34 @@ +version: '3.6' +services: + jupyter: + networks: + - labinabox + ports: + - '80:8000' + - '443:8000' + image: gsiupm/jupyterhub-oauth:0.8.1 + command: jupyterhub -f /srv/jupyterhub/jupyterhub_config.py + hostname: jupyterhub + volumes: + - "/mnt/home:/mnt/home" + - "/var/run/docker.sock:/var/run/docker.sock" + - '/etc/ssl/ssl-custom/cert.pem:/srv/oauthenticator/ssl/ssl.cert' + - '/etc/ssl/ssl-custom/key.pem:/srv/oauthenticator/ssl/ssl.key' + environment: + OAUTH_CALLBACK_URL: "https://github.${DOMAIN-?todevnull.com}/hub/oauth_callback" + HOST_HOMEDIR: "/mnt/home/{username}" + OAUTH_CLASS: "oauthenticator.github.GitHubOAuthenticator" + GITHUB_CLIENT_ID: "${GITHUB_CLIENT_ID}" + GITHUB_CLIENT_SECRET: "${GITHUB_CLIENT_SECRET}" + JPY_COOKIE_SECRET: "${JPY_COOKIE_SECRET}" + JPY_API_TOKEN: "${CONFIGPROXY_AUTH_TOKEN}" + OAUTH_TLS_VERIFY: 0 + COMMON_DIR: "/mnt/home/common" + DATASETS_DIR: "/mnt/home/datasets" + ADMINS: "${HUB_ADMINS-balkian,root}" + DOCKER_MEM_LIMIT: '250M' + DOCKER_NETWORK: labinabox + +networks: + labinabox: + name: labinabox diff --git a/gitlab-selfhosted/README.md b/gitlab-selfhosted/README.md new file mode 100644 index 0000000..5cef6fe --- /dev/null +++ b/gitlab-selfhosted/README.md @@ -0,0 +1,51 @@ +This deployment contains: + +* Gitlab for code and authentication. The omnibus image is an all-in-one package that contains: + * Git (`git.domain.com`) and CI/CD + * Docker registry (`registry.domain.com`) + * Mattermost (slack clone, `chat.domain.com`) + * Postgres db +* Jupyterhub for multi-user computing (`hub.domain.com`) + * Authentication with Gitlab + * Every user has an isolated environment (thanks to Docker-spawner) + * Based on https://github.com/balkian/jupyterhub-oauth +* Nginx as a reverse proxy + +# Instructions + +Set the `DOMAIN` variable in a `.env` file and run this compose. +After GitLab is loaded, create an OAuth application in Gitlab: https://docs.gitlab.com/ce/integration/oauth_provider.html +The redirect URL is: `https://hub./hub/oauth_callback`. +Click on `API` level and `trusted`. + +Write the client ID and client secret to a `.env` file in this folder. + +Then, update the Jupyter service. + +# Example .env file + +``` +DOMAIN= +GITLAB_CLIENT_ID= +GITLAB_CLIENT_SECRET= +``` + +# Advanced configuration + +GitLab's documentation is terrific. +For a list of configuration options, see https://docs.gitlab.com/omnibus/docker/#install-gitlab-using-docker-compose . + + +# Note + +When run as part of the omnibus, mattermost should register an application automatically on Gitlab. +I've had some issues with authentication, so if I've explicitly added the OAuth parameters in the compose file. +This way, you can manually register mattermost on your instance. +The process is similar to jupyter, and the callback URLs are: + + +``` +https://chat.$DOMAIN/signup/gitlab/complete +https://chat.$DOMAIN/login/gitlab/complete +``` + diff --git a/gitlab-selfhosted/docker-compose.yml b/gitlab-selfhosted/docker-compose.yml new file mode 100644 index 0000000..8d27b33 --- /dev/null +++ b/gitlab-selfhosted/docker-compose.yml @@ -0,0 +1,118 @@ +version: '3.6' +services: + web: + build: nginx + networks: + - labinabox + image: 'custom-nginx' + depends_on: + - jupyter + - gitlab + ports: + - '80:80' + - '443:443' + environment: + REGISTRY_INTERNAL: http://gitlab:4567 + LAB_INTERNAL: http://gitlab:80 + HUB_INTERNAL: http://jupyter:8000 + DOMAIN: ${DOMAIN-?todevnull.com} + volumes: + - '/etc/ssl/ssl-custom/cert.pem:/ssl/certs/nginx.crt' + - '/etc/ssl/ssl-custom/key.pem:/ssl/private/nginx.key' + - '/data/html:/usr/share/nginx/html' + gitlab: + networks: + - labinabox + image: 'gitlab/gitlab-ce:10.7.3-ce.0' + restart: always + hostname: 'lab.${DOMAIN-?todevnull.com}' + environment: + GITLAB_OMNIBUS_CONFIG: | + external_url 'https://lab.${DOMAIN-?todevnull.com}' + mattermost_external_url 'https://chat.todevnull.com/' + registry_external_url 'https://registry.${DOMAIN-?todevnull.com}' + + nginx['listen_port'] = 80 + nginx['listen_https'] = false + # LFS + gitlab_rails['lfs_enabled'] = true + gitlab_rails['lfs_storage_path'] = "/mnt/lfs" + # Registry + gitlab_rails['registry_enabled'] = true + gitlab_rails['registry_port'] = "443" + + registry_nginx['redirect_http_to_https'] = false + registry_nginx['listen_port'] = 4567 + registry_nginx['nginx_enable'] = true + registry_nginx['listen_https'] = false + registry_nginx['https'] = false + gitlab_rails['registry_api_url'] = "http://127.0.0.1:5000" + registry['registry_http_addr'] = "127.0.0.1:5000" + registry_nginx['proxy_set_headers'] = { + "Host" => "registry.${DOMAIN}", + "X-Forwarded-Proto" => "https", + "X-Forwarded-Ssl" => "on" + } + + mattermost['service_use_ssl'] = false + mattermost_nginx['listen_port'] = 80 + mattermost_nginx['listen_https'] = false + mattermost['gitlab_enable'] = true + mattermost['gitlab_id'] = "${MATTERMOST_ID}" + mattermost['gitlab_secret'] = "${MATTERMOST_SECRET}" + mattermost['gitlab_scope'] = "" + mattermost['gitlab_auth_endpoint'] = "https://lab.${DOMAIN-?todevnull.com}/oauth/authorize" + mattermost['gitlab_token_endpoint'] = "https://lab.${DOMAIN-?todevnull.com}/oauth/token" + mattermost['gitlab_user_api_endpoint'] = "https://lab.${DOMAIN-?todevnull.com}/api/v4/user" + + + # Chat + mattermost_nginx['redirect_http_to_https'] = false + ports: + - '8080:80' + - '8443:443' + - '22:22' + - '4567:4567' + volumes: + - 'gitlab-config:/etc/gitlab' + - 'gitlab-logs:/var/log/gitlab' + - 'gitlab-data:/var/opt/gitlab' + - 'gitlab-lfs:/mnt/lfs' + jupyter: + networks: + - labinabox + image: gsiupm/jupyterhub-oauth:0.8.1 + command: jupyterhub --no-ssl -f /srv/jupyterhub/jupyterhub_config.py + hostname: jupyterhub + volumes: + - "/mnt/home:/mnt/home" + - "/var/run/docker.sock:/var/run/docker.sock" + environment: + OAUTH_CALLBACK_URL: "https://hub.${DOMAIN-?todevnull.com}/hub/oauth_callback" + HOST_HOMEDIR: "/mnt/home/{username}" + OAUTH_CLASS: "oauthenticator.gitlab.GitLabOAuthenticator" + GITLAB_HOST: "https://lab.${DOMAIN-?todevnull.com}/" + GITLAB_CLIENT_ID: "${GITLAB_CLIENT_ID}" + GITLAB_CLIENT_SECRET: "${GITLAB_CLIENT_SECRET}" + JPY_COOKIE_SECRET: "${JPY_COOKIE_SECRET}" + JPY_API_TOKEN: "${CONFIGPROXY_AUTH_TOKEN}" + OAUTH_TLS_VERIFY: 0 + COMMON_DIR: "/mnt/home/common" + DATASETS_DIR: "/mnt/home/datasets" + ADMINS: "${HUB_ADMINS-balkian,root}" + DOCKER_NETWORK: labinabox + DOCKER_MEM_LIMIT: '150M' + +volumes: + gitlab-config: + name: gitlab-config + gitlab-logs: + name: gitlab-logs + gitlab-data: + name: gitlab-data + gitlab-lfs: + name: gitlab-lfs + +networks: + labinabox: + name: labinabox diff --git a/gitlab-selfhosted/nginx/Dockerfile b/gitlab-selfhosted/nginx/Dockerfile new file mode 100644 index 0000000..886798d --- /dev/null +++ b/gitlab-selfhosted/nginx/Dockerfile @@ -0,0 +1,9 @@ +FROM nginx:1.13 + +ENV DOMAIN="todevnull.com" HUB_INTERNAL="http://jupyter:80" LAB_INTERNAL="gitlab:80" REGISTRY_INTERNAL="gitlab:5000" + +COPY init.sh / +COPY snippets/ /etc/nginx/snippets/ +COPY conf.d /etc/nginx/conf.d/ + +CMD /init.sh \ No newline at end of file diff --git a/gitlab-selfhosted/nginx/conf.d/hub.conf.template b/gitlab-selfhosted/nginx/conf.d/hub.conf.template new file mode 100644 index 0000000..6ec7bd5 --- /dev/null +++ b/gitlab-selfhosted/nginx/conf.d/hub.conf.template @@ -0,0 +1,54 @@ +server { + listen 80; + + access_log /var/log/nginx/hub.access.log; +# add Strict-Transport-Security to prevent man in the middle attacks + + server_name hub.${DOMAIN}; + + root /var/www/html/ ; + + include /etc/nginx/snippets/letsencrypt.conf; + + location / { + return 301 https://$host$request_uri; + } + + +} + +server { + listen 443 ssl; + listen [::]:443 ssl; + + server_name hub.${DOMAIN} ; + + access_log /var/log/nginx/hub.access.log; + error_log /var/log/nginx/hub.error.log; + # add Strict-Transport-Security to prevent man in the middle attacks + #add_header Strict-Transport-Security "max-age=31536000"; + client_max_body_size 100M; + + ssl_certificate /ssl/certs/nginx.crt; + ssl_certificate_key /ssl/private/nginx.key; + + location / { + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + proxy_pass ${HUB_INTERNAL}; + } + + location ~* /user/([a-zA-Z0-9]+)/(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/? { + + proxy_set_header Host $host; + # websocket support + proxy_http_version 1.1; + proxy_set_header Upgrade "websocket"; + proxy_set_header Connection "Upgrade"; + + proxy_pass ${HUB_INTERNAL}; + } +} \ No newline at end of file diff --git a/gitlab-selfhosted/nginx/conf.d/lab.conf.template b/gitlab-selfhosted/nginx/conf.d/lab.conf.template new file mode 100644 index 0000000..b42b59d --- /dev/null +++ b/gitlab-selfhosted/nginx/conf.d/lab.conf.template @@ -0,0 +1,79 @@ +server { + listen 80; + + server_name lab.${DOMAIN} chat.${DOMAIN}; + + root /var/www/html/ ; + + include /etc/nginx/snippets/letsencrypt.conf; + + location / { + return 301 https://$host$request_uri; + } + +} + +server { + listen 443 ssl; + listen [::]:443 ssl; + + server_name lab.${DOMAIN} chat.${DOMAIN} ; + + access_log /var/log/nginx/gitlab.access.log; + error_log /var/log/nginx/gitlab.error.log; + # add Strict-Transport-Security to prevent man in the middle attacks + #add_header Strict-Transport-Security "max-age=31536000"; + ssl_certificate /ssl/certs/nginx.crt; + ssl_certificate_key /ssl/private/nginx.key; + + root /var/www/html/ ; + + + location / { + client_max_body_size 0; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + proxy_pass ${LAB_INTERNAL} ; + } + +} + +server { + listen 443 ssl; + listen [::]:443 ssl; + + server_name registry.${DOMAIN} ; + + access_log /var/log/nginx/gitlab-registry.access.log; + error_log /var/log/nginx/gitlab-registry.error.log; + # add Strict-Transport-Security to prevent man in the middle attacks + #add_header Strict-Transport-Security "max-age=31536000"; + # disable any limits to avoid HTTP 413 for large image uploads + client_max_body_size 0; + + # required to avoid HTTP 411: see Issue #1486 (https://github.com/docker/docker/issues/1486) + chunked_transfer_encoding on; + ssl_protocols TLSv1.1 TLSv1.2; + ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH'; + ssl_prefer_server_ciphers on; + ssl_session_cache shared:SSL:10m; + + include /etc/nginx/snippets/letsencrypt.conf; + + root /var/www/html/ ; + + location / { + + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + proxy_pass ${REGISTRY_INTERNAL}; + proxy_read_timeout 900; + } + +} diff --git a/gitlab-selfhosted/nginx/init.sh b/gitlab-selfhosted/nginx/init.sh new file mode 100755 index 0000000..ce7d1b8 --- /dev/null +++ b/gitlab-selfhosted/nginx/init.sh @@ -0,0 +1,9 @@ +#!/bin/sh +cd /etc/nginx/conf.d/ + +VARIABLES='${HUB_INTERNAL}${DOMAIN}${LAB_INTERNAL}${REGISTRY_INTERNAL}' +for i in *.template; do + target=$(basename $i ".template") + envsubst $VARIABLES < $i > $target; +done || exit 1 +nginx -g 'daemon off;' diff --git a/gitlab-selfhosted/nginx/snippets/letsencrypt.conf b/gitlab-selfhosted/nginx/snippets/letsencrypt.conf new file mode 100644 index 0000000..869705e --- /dev/null +++ b/gitlab-selfhosted/nginx/snippets/letsencrypt.conf @@ -0,0 +1,44 @@ +############################################################################# +# Configuration file for Let's Encrypt ACME Challenge location +# This file is already included in listen_xxx.conf files. +# Do NOT include it separately! +############################################################################# +# +# This config enables to access /.well-known/acme-challenge/xxxxxxxxxxx +# on all our sites (HTTP), including all subdomains. +# This is required by ACME Challenge (webroot authentication). +# You can check that this location is working by placing ping.txt here: +# /var/www/letsencrypt/.well-known/acme-challenge/ping.txt +# And pointing your browser to: +# http://xxx.domain.tld/.well-known/acme-challenge/ping.txt +# +# Sources: +# https://community.letsencrypt.org/t/howto-easy-cert-generation-and-renewal-with-nginx/3491 +# +############################################################################# + +# Rule for legitimate ACME Challenge requests (like /.well-known/acme-challenge/xxxxxxxxx) +# We use ^~ here, so that we don't check other regexes (for speed-up). We actually MUST cancel +# other regex checks, because in our other config files have regex rule that denies access to files with dotted names. +location ^~ /.well-known/acme-challenge/ { + + # Set correct content type. According to this: + # https://community.letsencrypt.org/t/using-the-webroot-domain-verification-method/1445/29 + # Current specification requires "text/plain" or no content header at all. + # It seems that "text/plain" is a safe option. + default_type "text/plain"; + + # This directory must be the same as in /etc/letsencrypt/cli.ini + # as "webroot-path" parameter. Also don't forget to set "authenticator" parameter + # there to "webroot". + # Do NOT use alias, use root! Target directory is located here: + # /var/www/common/letsencrypt/.well-known/acme-challenge/ + root /var/www/letsencrypt; +} + +# Hide /acme-challenge subdirectory and return 404 on all requests. +# It is somewhat more secure than letting Nginx return 403. +# Ending slash is important! +location = /.well-known/acme-challenge/ { + return 404; +}