balkian.comhttp://balkian.com/2014-12-09T12:12:12+01:00Zotero2014-12-09T12:12:12+01:00J. Fernando Sáncheztag:balkian.com,2014-12-09:zotero.html<p><a class="reference external" href="https://www.zotero.org/">Zotero</a> is an Open Source tool that lets you organise your bibliography, syncing it with the cloud. Unlike other alternatives such as <a class="reference external" href="http://www.mendeley.com">Mendeley</a>, Zotero can upload the attachments and data to a private cloud via WebDav.</p> <p>If you use nginx as your web server, know that even though it provides partial support for webdav, Zotero needs more than that. Hence, you will need another webdav server, and optionally let nginx proxy to it. This short post provides the basics to get that set-up working under Debian/Ubuntu.</p> <div class="section" id="setting-up-apache"> <h2>Setting up Apache</h2> <p>First we need to install Apache:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre>sudo apt-get install apache2 </pre></div> </td></tr></table><p>Change the head of &quot;/etc/apache2/sites-enabled/000-default&quot; to:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre><span class="nt">&lt;VirtualHost</span> <span class="s">*:880</span><span class="nt">&gt;</span> </pre></div> </td></tr></table><p>Then, create a file /etc/apache2/sites-available/webdav:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13</pre></div></td><td class="code"><div class="highlight"><pre><span class="nb">Alias</span> <span class="sx">/dav</span> <span class="sx">/home/webdav/dav</span> <span class="nt">&lt;Location</span> <span class="s">/dav</span><span class="nt">&gt;</span> <span class="nb">Dav</span> <span class="k">on</span> <span class="nb">Order</span> Allow,Deny <span class="nb">Allow</span> from <span class="k">all</span> <span class="nb">Dav</span> <span class="k">On</span> <span class="nb">Options</span> +Indexes <span class="nb">AuthType</span> Basic <span class="nb">AuthName</span> DAV <span class="nb">AuthBasicProvider</span> file <span class="nb">AuthUserFile</span> <span class="sx">/home/webdav/.htpasswd</span> <span class="nb">Require</span> valid-user <span class="nt">&lt;/Location&gt;</span> </pre></div> </td></tr></table><p>Ideally, you want your webdav folders to be private, adding authentication to them. So you need to create the webdav and zotero users and add the passwords to an htpasswd file. Even though you could use a single user, since you will be configuring several clients with your credentials I encourage you to create the zotero user as well. This way you can always change the password for zotero without affecting any other application using webdav.</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4</pre></div></td><td class="code"><div class="highlight"><pre>sudo adduser webdav sudo htpasswd -c /home/webdav/.htpasswd webdav sudo htpasswd /home/webdav/.htpasswd zotero sudo mkdir -p /home/webdav/dav/zotero </pre></div> </td></tr></table><p>Enable the site and restart apache:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4</pre></div></td><td class="code"><div class="highlight"><pre>sudo a2enmod webdav sudo a2enmod dav_fs sudo a2ensite webdav sudo service apache2 restart </pre></div> </td></tr></table><p>At this point everything should be working at <a class="reference external" href="http:/">http:/</a>/&lt;your_host&gt;:880/dav/zotero</p> </div> <div class="section" id="setting-up-nginx"> <h2>Setting up NGINX</h2> <p>After the Apache side is working, we can use nginx as a proxy to get cleaner URIs. In your desired site/location, add this:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5 6 7</pre></div></td><td class="code"><div class="highlight"><pre><span class="k">location</span> <span class="s">/dav</span> <span class="p">{</span> <span class="kn">client_max_body_size</span> <span class="s">20M</span><span class="p">;</span> <span class="kn">proxy_set_header</span> <span class="s">X-Real-IP</span> <span class="nv">$remote_addr</span><span class="p">;</span> <span class="kn">proxy_set_header</span> <span class="s">X-Forwarded-For</span> <span class="nv">$remote_addr</span><span class="p">;</span> <span class="kn">proxy_set_header</span> <span class="s">Host</span> <span class="nv">$host</span><span class="p">;</span> <span class="kn">proxy_pass</span> <span class="s">http://127.0.0.1:880</span><span class="p">;</span> <span class="p">}</span> </pre></div> </td></tr></table><p>Now just reload nginx:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre>sudo service nginx force-reload </pre></div> </td></tr></table></div> <div class="section" id="extras"> <h2>Extras</h2> <ul class="simple"> <li><a class="reference external" href="http://zoteroreader.com/">Zotero Reader</a> - HTML5 client</li> <li><a class="reference external" href="https://github.com/ajlyon/zandy">Zandy</a> - Android Open Source client</li> </ul> </div> Proxies with Apache and python2014-10-09T10:00:00+02:00J. Fernando Sáncheztag:balkian.com,2014-10-09:proxies-with-apache-and-python.html<p>This is a quick note on proxying a local python application (e.g. flask) to a subdirectory in Apache. This assumes that the file wsgi.py contains a WSGI application with the name <em>application</em>. Hence, wsgi:application.</p> <div class="section" id="gunicorn"> <h2>Gunicorn</h2> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4 5</pre></div></td><td class="code"><div class="highlight"><pre><span class="nt">&lt;Location</span> <span class="s">/myapp/</span><span class="nt">&gt;</span> <span class="nb">ProxyPass</span> http://127.0.0.1:8888/myapp/ <span class="nb">ProxyPassReverse</span> http://127.0.0.1:8888/myapp/ <span class="nb">RequestHeader</span> set SCRIPT_NAME <span class="s2">&quot;/myapp/&quot;</span> <span class="nt">&lt;/Location&gt;</span> </pre></div> </td></tr></table><p><strong>Important</strong>: <em>SCRIPT_NAME</em> and the end of <em>ProxyPass</em> URL <strong>MUST BE THE SAME</strong>. Otherwise, Gunicorn will fail miserably.</p> <p>Try it with:</p> <div class="highlight"><pre>venv/bin/gunicorn -w <span class="m">4</span> -b 127.0.0.1:8888 --log-file - --access-logfile - wsgi:application </pre></div> </div> <div class="section" id="uwsgi"> <h2>UWSGI</h2> <p>This is a very simple configuration. I will try to upload one with more options for uwsgi (in a .ini file).</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3 4</pre></div></td><td class="code"><div class="highlight"><pre><span class="nt">&lt;Location</span> <span class="s">/myapp/</span><span class="nt">&gt;</span> <span class="nb">SetHandler</span> uwsgi_handler <span class="nb">uWSGISocker</span> <span class="m">127.0.0.1</span>:8888 <span class="nt">&lt;/Location&gt;</span> </pre></div> </td></tr></table><p>Try it with:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre>uwsgi --socket 127.0.0.1:8888 -w wsgi:application </pre></div> </td></tr></table><div class="section" id="extra-supervisor"> <h3>Extra: Supervisor</h3> <p>If everything went as expected, you can wrap your command in a supervisor config file and let it handle the server for you.</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19</pre></div></td><td class="code"><div class="highlight"><pre><span class="k">[unix_http_server]</span> <span class="na">file</span><span class="o">=</span><span class="s">/tmp/myapp.sock ; path to your socket file</span> <span class="k">[supervisord]</span> <span class="na">logfile</span> <span class="o">=</span> <span class="s">%(here)s/logs/supervisor.log</span> <span class="na">childlogdir</span> <span class="o">=</span> <span class="s">%(here)s/logs/</span> <span class="k">[rpcinterface:supervisor]</span> <span class="na">supervisor.rpcinterface_factory</span> <span class="o">=</span> <span class="s">supervisor.rpcinterface:make_main_rpcinterface</span> <span class="k">[supervisorctl]</span> <span class="na">logfile</span> <span class="o">=</span> <span class="s">%(here)s/logs/supervisorctl.log</span> <span class="na">serverurl</span><span class="o">=</span><span class="s">unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket</span> <span class="k">[program:myapp]</span> <span class="na">command</span> <span class="o">=</span> <span class="s">venv/bin/gunicorn -w 4 -b 0.0.0.0:5000 --log-file %(here)s/logs/gunicorn.log --access-logfile - wsgi:application</span> <span class="na">directory</span> <span class="o">=</span> <span class="s">%(here)s</span> <span class="na">environment</span> <span class="o">=</span> <span class="s">PATH=%(here)s/venv/bin/</span> <span class="na">logfile</span> <span class="o">=</span> <span class="s">%(here)s/logs/myapp.log</span> </pre></div> </td></tr></table></div> </div> Publishing in PyPi2014-09-27T10:00:00+02:00J. Fernando Sáncheztag:balkian.com,2014-09-27:publishing-in-pypi.html<p>Developing a python module and publishing it on Github is cool, but most of the times you want others to download and use it easily. That is the role of PyPi, the python package repository. In this post I show you how to publish your package in less than 10 minutes.</p> <div class="section" id="choose-a-fancy-name"> <h2>Choose a fancy name</h2> <p>If you haven't done so yet, take a minute or two to think about this. To publish on PyPi you need a name for your package that isn't taken. What's more, a catchy and unique name will help people remember your module and feel more inclined to at least try it.</p> <p>The package name should hint what your module does, but that's not always the case. That's your call. I personally put uniqueness and memorability over describing the functionality.</p> </div> <div class="section" id="create-a-pypirc-configuration-file"> <h2>Create a .pypirc configuration file</h2> <pre class="code cfg literal-block"> <span class="err">[distutils]</span> <span class="c1"># this tells distutils what package indexes you can push to</span> <span class="na">index-servers</span> <span class="o">=</span><span class="s"> pypi # the live PyPI pypitest # test PyPI</span> <span class="err">[pypi]</span> <span class="c1"># authentication details for live PyPI</span> <span class="na">repository</span> <span class="o">=</span> <span class="s">https://pypi.python.org/pypi</span> <span class="na">username</span> <span class="o">=</span> <span class="s">{ your_username }</span> <span class="na">password</span> <span class="o">=</span> <span class="s">{ your_password } # not necessary</span> <span class="err">[pypitest]</span> <span class="c1"># authentication details for test PyPI</span> <span class="na">repository</span> <span class="o">=</span> <span class="s">https://testpypi.python.org/pypi</span> <span class="na">username</span> <span class="o">=</span> <span class="s">{ your_username }</span> </pre> <p>As you can see, you need to register both in the <a class="reference external" href="https://pypi.python.org/pypi?%3Aaction=register_form">main pypi repository</a> and the <a class="reference external" href="https://testpypi.python.org/pypi?%3Aaction=register_form">testing server</a>. The usernames and passwords might be different, that is up to you!</p> </div> <div class="section" id="prepare-your-package"> <h2>Prepare your package</h2> <pre class="literal-block"> root-dir/ # Any name you want setup.py setup.cfg LICENSE.txt README.md mypackage/ __init__.py foo.py bar.py baz.py </pre> <div class="section" id="setup-cfg"> <h3>setup.cfg</h3> <pre class="code cfg literal-block"> <span class="k">[metadata]</span> <span class="na">description-file</span> <span class="o">=</span> <span class="s">README.md</span> </pre> <p>The markdown README is the <em>de facto</em> standard in Github, but you can also use rST (reStructuredText), the standard in the python community.</p> </div> <div class="section" id="setup-py"> <h3>setup.py</h3> <p>{% highlight python %} from distutils.core import setup setup( name = 'mypackage', packages = ['mypackage'], # this must be the same as the name above version = '{ version }', description = '{ description }', author = '{ name }', author_email = '{ email }', url = '<a class="reference external" href="https://github.com">https://github.com</a>/{user}/{package}', # URL to the github repo download_url = '<a class="reference external" href="https://github.com">https://github.com</a>/{user}/{repo}/tarball/{version}', keywords = ['websockets', 'display', 'd3'], # list of keywords that represent your package classifiers = [], ) {% endhighlight %}</p> <p>You might notice that the download_url points to a Github URL. We could host our package anywhere, but Github is a convenient option. To create the tarball and the zip packages, you only need to tag a tag in your repository and push it to github:</p> <pre class="literal-block"> git tag {version} -m &quot;{ Description of this tag/version}&quot; git push --tags origin master </pre> </div> </div> <div class="section" id="push-to-the-testing-main-pypi-server"> <h2>Push to the testing/main pypi server</h2> <p>It is advisable that you try your package on the test repository and fix any problems first. The process is simple: <tt class="docutils literal">python setup.py register <span class="pre">-r</span> {pypitest/pypi} python setup.py sdist upload <span class="pre">-r</span> {pypitest/pypi}</tt></p> <p>If everything went as expected, you can now install your package through pip and browse your package's page. For instance, check my senpy package: <a class="reference external" href="https://pypi.python.org/pypi/senpy">https://pypi.python.org/pypi/senpy</a> <tt class="docutils literal">pip install senpy</tt></p> </div> Updating EuroLoveMap2014-03-27T14:00:00+01:00J. Fernando Sáncheztag:balkian.com,2014-03-27:updating-eurolovemap.html<p>As part of the <a class="reference external" href="http://www.opener-project.org/2013/07/18/opener-hackathon-in-amsterdam/">OpeNER hackathon</a> we decided to build a prototype that would allow us to compare how different countries feel about several topics. We used the OpeNER pipeline to get the sentiment from a set of newspaper articles we gathered from media in several languages. Then we aggregated those articles by category and country (using the source of the article or the language it was written in), obtaining the &quot;overall feeling&quot; of each country about each topic. Then, we used some fancy JavaScript to make sense out of the raw information.</p> <p>It didn't go too bad, it turns out <a class="reference external" href="http://eurosentiment.eu/wp-content/uploads/2013/07/BOLv9qnCIAAJEek.jpg">we won</a>.</p> <p>Now, it was time for a face-lift. I used this opportunity to play with new technologies and improve it:</p> <ul class="simple"> <li>Using Flask, this time using python 3.3 and Bootstrap 3.0</li> <li>Cool HTML5+JS cards (thanks to <a class="reference external" href="http://pastetophone.com">pastetophone</a>)</li> <li>Automatic generation of fake personal data to test the interface</li> <li>Obfuscation of personal emails</li> </ul> <div class="section" id="publishing-a-python-3-app-on-heroku"> <h2>Publishing a Python 3 app on Heroku</h2> <p><a class="reference external" href="http://eurolovemap.herokuapp.com/">seen here</a></p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre>mkvirtualenv -p /usr/bin/python3.3 eurolovemap </pre></div> </td></tr></table><p>Since Heroku uses python 2.7 by default, we have to tell it which version we want, although it supports python 3.4 as well. I couldn't get python 3.4 working using the <a class="reference external" href="https://launchpad.net/~fkrull/+archive/deadsnakes">deadsnakes</a> ppa, so I used python 3.3 instead, which works fine but is not officially supported. Just create a file named <em>runtime.txt</em> in your project root, with the python version you want to use:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre>python-3.3.1 </pre></div> </td></tr></table><p>Don't forget to freeze your dependencies so Heroku can install them: <tt class="docutils literal">bash pip freze &gt; requirements.txt</tt></p> </div> <div class="section" id="publishing-personal-emails"> <h2>Publishing personal emails</h2> <p>There are really sophisticated and effective ways to obfuscate personal emails so that spammers cannot easily grab yours. However, this time I needed something really simple to hide our emails from the simplest form of crawlers. Most of the team are in academia somehow, so in the end all our emails are available in sites like Google Scholar. Anyway, nobody likes getting spammed so I settled for a custom <a class="reference external" href="http://en.wikipedia.org/wiki/Caesar_cipher">Caesar cipher</a>. Please, don't use it for any serious application if you are concerned about being spammed.</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2</pre></div></td><td class="code"><div class="highlight"><pre><span class="k">def</span> <span class="nf">blur_email</span><span class="p">(</span><span class="n">email</span><span class="p">):</span> <span class="k">return</span> <span class="s">&quot;&quot;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="nb">chr</span><span class="p">(</span><span class="nb">ord</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="o">+</span><span class="mi">5</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">email</span><span class="p">])</span> </pre></div> </td></tr></table><p>And this is the client side:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1 2 3 4 5 6 7 8 9 10 11 12</pre></div></td><td class="code"><div class="highlight"><pre><span class="nb">window</span><span class="p">.</span><span class="nx">onload</span> <span class="o">=</span> <span class="kd">function</span><span class="p">(){</span> <span class="nx">elems</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementsByClassName</span><span class="p">(</span><span class="s1">&#39;profile-email&#39;</span><span class="p">);</span> <span class="k">for</span><span class="p">(</span><span class="kd">var</span> <span class="nx">e</span> <span class="k">in</span> <span class="nx">elems</span><span class="p">){</span> <span class="kd">var</span> <span class="nx">blur</span> <span class="o">=</span> <span class="nx">elems</span><span class="p">[</span><span class="nx">e</span><span class="p">].</span><span class="nx">innerHTML</span><span class="p">;</span> <span class="kd">var</span> <span class="nx">email</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span><span class="p">;</span> <span class="k">for</span><span class="p">(</span><span class="kd">var</span> <span class="nx">s</span> <span class="k">in</span> <span class="nx">blur</span><span class="p">){</span> <span class="kd">var</span> <span class="nx">a</span> <span class="o">=</span> <span class="nx">blur</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span> <span class="nx">email</span> <span class="o">=</span> <span class="nx">email</span><span class="o">+</span><span class="nb">String</span><span class="p">.</span><span class="nx">fromCharCode</span><span class="p">(</span><span class="nx">a</span><span class="o">-</span><span class="mi">5</span><span class="p">);</span> <span class="p">}</span> <span class="nx">elems</span><span class="p">[</span><span class="nx">e</span><span class="p">].</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="nx">email</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> </pre></div> </td></tr></table><p>Unfortunately, this approach does not hide your email from anyone using <a class="reference external" href="http://phantomjs.org/">PhantomJS</a>, <a class="reference external" href="http://zombie.labnotes.org/">ZombieJS</a> or similar. For that, other approaches like generating a picture with the address would be necessary. Nevertheless, it is overkill for a really simple ad-hoc application with custom formatting and just a bunch of emails that would easily be grabbed manually.</p> </div> <div class="section" id="generation-of-fake-data"> <h2>Generation of fake data</h2> <p>To test the contact section of the site, I wanted to populate it with fake data. <a class="reference external" href="https://github.com/joke2k/faker">Fake-Factory</a> is an amazing library that can generate fake data of almost any kind: emails, association names, acronyms... It even lets you localise the results (get Spanish names, for instance) and generate factories for certain classes (à la Django).</p> <p>But I also wanted pictures, enter <a class="reference external" href="http://lorempixel.com/">Lorem Pixel</a>. With its API you can generate pictures of almost any size, for different topics (e.g. nightlife, people) and with a custom text. You can even use an index, so it will always show the same picture.</p> <p>For instance, the picture below is served through Lorem Pixel.</p> <div class="figure"> <img alt="This picture is generated with LoremIpsum" src="http://lorempixel.com/400/200/nightlife/" /> <p class="caption">This picture is generated with LoremIpsum</p> </div> <p>By the way, if you only want cat pictures, take a look at <a class="reference external" href="http://placekitten.com/">Placekitten</a>. And for NSFW text, there's the <a class="reference external" href="http://slipsum.com/">Samuel L. Jackson Ipsum</a></p> </div> Remove git files with globbing2013-08-22T23:14:00+02:00J. Fernando Sáncheztag:balkian.com,2013-08-22:remove-git-files-with-globbing.html<p>A simple trick. If you want to remove all the '.swp' files from a git repository, just use:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1</pre></div></td><td class="code"><div class="highlight"><pre>git rm --cached <span class="s1">&#39;\*\*.swp&#39;</span> </pre></div> </td></tr></table>Creating my web2013-08-22T14:14:22+02:00J. Fernando Sáncheztag:balkian.com,2013-08-22:creating-my-web.html<p>Finally, I've decided to set up a decent personal page. I have settled for github-pages because I like the idea of keeping my site in a repository and having someone else host and deploy it for me. The site will be really simple, mostly static files. Thanks to Github, <a class="reference external" href="http://jekyllrb.com">Jekyll</a> will automatically generate static pages for my posts every time I commit anything new to this repository.</p> <p>But Jekyll can be used independently, so if I ever choose to host the site myself, I can do it quite easily. Another thing that I liked about this approach is that the generated html files can be used in the future, and I will not need Jekyll to serve it. Jekyll is really simple and most of the things are written in plain html. That means that everything could be easily reused if I ever choose to change to another blogging framework (e.g. pelical). But, for the time being, I like the fact that Github takes care of the compilation as well, so I can simply modify or add files through the web interface should I need to.</p> <p>I hadn't played with HTML and CSS for a while now, so I also wanted to use this site as a playground. At some point, I realised I was doing mostly everything in plain HTML and CSS, and decided to keep it like that for as long as possible. As of this writing, I haven't included any Javascript code in the page. Probably I will use some to add my <a class="reference external" href="http://gist.github.com/balkian">gists</a> and <a class="reference external" href="http://github.com/balkian">repositories</a>, but we will see about that.</p> <p>I think the code speaks for itself, so you can check out <a class="reference external" href="http://github.com/balkian/balkian.github.com">my repository on Github</a>. You can clone and deploy it easily like this:</p> <table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1 2 3</pre></div></td><td class="code"><div class="highlight"><pre>git clone https://github.com/balkian/balkian.github.com <span class="nb">cd </span>balkian.github.com jekyll serve -w </pre></div> </td></tr></table><p>I will keep updating this post with information about: * Some Jekyll plugins that might be useful * What CSS tricks I learnt * The webfonts I used * The badge on the left side of the page</p>