mirror of
https://github.com/balkian/balkian.github.com.git
synced 2025-04-18 18:59:04 +00:00
242 lines
19 KiB
XML
242 lines
19 KiB
XML
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Programming on J. Fernando Sánchez</title><link>https://balkian.com/categories/programming/</link><description>Recent content in Programming on J. Fernando Sánchez</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Wed, 26 Feb 2025 23:22:59 +0100</lastBuildDate><atom:link href="https://balkian.com/categories/programming/index.xml" rel="self" type="application/rss+xml"/><item><title>Bridging RDF, JSON-LD and Dataclasses</title><link>https://balkian.com/p/bridging-rdf-json-ld-and-dataclasses/</link><pubDate>Wed, 26 Feb 2025 23:22:59 +0100</pubDate><guid>https://balkian.com/p/bridging-rdf-json-ld-and-dataclasses/</guid><description><p>In the RDF world, data is expressed as a collection of triples.
|
|
These triples can contain IRIs that may or may not be accessible or valid.
|
|
And the use of these IRIs may or may not adhere to a vocabulary.
|
|
Checking the validity of the IRIs and the semantics of the triples is an additional step.</p>
|
|
<h2 id="the-rdflib-way">The <code>rdflib</code> way
|
|
</h2><p><code>rdflib</code> only models IRIs, values and namespaces.
|
|
Developers need to be cognisant of the URIs they are using, and the vocabularies being used.
|
|
Prior to version 2.0, senpy followed a very similar model.
|
|
It had a base class to represent a generic node.
|
|
Each instance then gets its own automatically generated id, and will act like a normal dictionary, whose keys and values will be serialized as a JSON-LD dictionary.
|
|
Multiple subclasses were also included to model specific types of node, mostly to provide convenience methods for the given subtype.
|
|
Here is an example of a subclass, <code>Entity</code>.</p>
|
|
<div class="highlight"><div class="chroma">
|
|
<table class="lntable"><tr><td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code><span class="lnt">1
|
|
</span><span class="lnt">2
|
|
</span><span class="lnt">3
|
|
</span><span class="lnt">4
|
|
</span><span class="lnt">5
|
|
</span></code></pre></td>
|
|
<td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">entry</span> <span class="o">=</span> <span class="n">Entry</span><span class="p">()</span>
|
|
</span></span><span class="line"><span class="cl">
|
|
</span></span><span class="line"><span class="cl"><span class="n">entry</span><span class="p">[</span><span class="s1">&#39;vocab:property&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">25</span>
|
|
</span></span><span class="line"><span class="cl">
|
|
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="n">entry</span><span class="o">.</span><span class="n">jsonld</span><span class="p">())</span>
|
|
</span></span></code></pre></td></tr></table>
|
|
</div>
|
|
</div><p>Would print something like this:</p>
|
|
<div class="highlight"><div class="chroma">
|
|
<table class="lntable"><tr><td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code><span class="lnt">1
|
|
</span><span class="lnt">2
|
|
</span><span class="lnt">3
|
|
</span><span class="lnt">4
|
|
</span><span class="lnt">5
|
|
</span></code></pre></td>
|
|
<td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;@id&#34;</span><span class="p">:</span> <span class="s2">&#34;:Entry_202505....&#34;</span><span class="p">,</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;@type&#34;</span><span class="p">:</span> <span class="s2">&#34;prefix:Entity&#34;</span><span class="p">,</span>
|
|
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;vocab:property&#34;</span><span class="p">:</span> <span class="mi">25</span>
|
|
</span></span><span class="line"><span class="cl"><span class="p">}</span>
|
|
</span></span></code></pre></td></tr></table>
|
|
</div>
|
|
</div><p>Producing correct triples using this model requires using the vocabularies and URIs properly, with little to no tooling to enforce it.
|
|
This poses a big problem for a tool like Senpy, which aims to make it easier for professionals without a background in RDF to build and consume semantic NLP ser
|
|
If an attribute is not a URI and is not included in the global JSON-LD context, it will not generate a triple in the final graph.
|
|
Moreover, there is way to enforce that the vocabularies and the</p>
|
|
<p>Pros:</p>
|
|
<ul>
|
|
<li>Flexible/extensible</li>
|
|
<li>Lightweight. This is mostly JSON-LD in Python&rsquo;s clothing.</li>
|
|
<li>Naturally maps to both <code>rdflib</code> and writing <code>json-ld</code></li>
|
|
</ul>
|
|
<p>Cons:</p>
|
|
<ul>
|
|
<li>Discoverability. Documentation and examples are needed to know which attributes to use</li>
|
|
<li>Error-prone. It is easy to misuse a property, or introduce typos</li>
|
|
<li>Tight coupling with semantics/RDF. One needs to know a thing or two about RDF, especially if new vocabularies or annotations need to be used.</li>
|
|
</ul>
|
|
<h2 id="the-object-oriented-way">The object-oriented way
|
|
</h2><p>An obvious alternative to this problem in an object-oriented language like python is to use classes to represent our data model.
|
|
These classes can define the specific attributes available, and typing annotations can serve both as a guide for the developer, and as a means to automatically
|
|
validate objects at runtime.
|
|
There are tools like <a class="link" href="https://pydantic.dev/" target="_blank" rel="noopener"
|
|
>pydantic</a> that make this process very simple.
|
|
Then, we only need to define how your models should be serialized into JSON-LD.
|
|
We can thoroughly test this serialization to ensure that the resulting object is correct and produces the right RDF graph.
|
|
Going back to our previous example, we could define an Entry class as a dataclass, and define all the possible types of annotations as attributes.</p>
|
|
<p>This model works great when all the possible attributes are known ahead of time.
|
|
But it starts to break when the model provided is not comprehensive enough, or customers of your library need to provide their own ad-hoc annotations / attribut
|
|
es.
|
|
This could be solved by encouring consumers of our library to define their own subclasses whenever they need to add new attributes.
|
|
This works perfectly fine for serialization, but it breaks if your library needs to automatically deserialize these subclasses.
|
|
It also breaks if different parts of the code need to add their own attributes on the same data at the same time.
|
|
This was precisely the case of <code>senpy</code>, where entities are annotated by different plugins, each providing a different set of annotations.</p>
|
|
<p>Pros:</p>
|
|
<ul>
|
|
<li>Discoverability. All possible attributes are known ahead of time, including their possible types.</li>
|
|
<li>Decoupling from RDF. Developers only need to know about the dataclasses provided. The mapping to the RDF world is already encoded in the dataclass.</li>
|
|
</ul>
|
|
<p>Cons:</p>
|
|
<ul>
|
|
<li>Rigidity. Adding new types of annotations requires modifying the models, in the main module.</li>
|
|
<li>Polymorphism.</li>
|
|
</ul>
|
|
<h2 id="a-hybrid-approach">A hybrid approach
|
|
</h2><p>Whichever solution is chosen in the end, it needs to:</p>
|
|
<ul>
|
|
<li>Make it easy and error-proof to add the most common types of annotations</li>
|
|
<li>Allow for additional annotations/attributes to be added</li>
|
|
<li>Allow for upgrades in the future. i.e., converting the most common custom annotations into built-in ones</li>
|
|
<li>Allow for deserialization of custom types</li>
|
|
<li>Allow multiple consumers to add their own annotations</li>
|
|
</ul></description></item><item><title>uv - One rust tool to rule all pythons</title><link>https://balkian.com/p/uv-one-rust-tool-to-rule-all-pythons/</link><pubDate>Mon, 17 Feb 2025 23:02:47 +0100</pubDate><guid>https://balkian.com/p/uv-one-rust-tool-to-rule-all-pythons/</guid><description><img src="https://balkian.com/img/uv.png" alt="Featured image of post uv - One rust tool to rule all pythons" /><p>Long story short: I&rsquo;m now using <a class="link" href="https://github.com/astral-sh/uv" target="_blank" rel="noopener"
|
|
>uv</a>, and so should you.
|
|
It is a great replacement for pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.</p>
|
|
<h2 id="context">Context
|
|
</h2><p>For years, my strategy to manage python projects has been a mix of a custom <code>setup.py</code>, several hand-crafted <code>requirements.txt</code> files (through <code>pip freeze</code>), a custom virtualenv per project, and multiple tools to upload to PyPI.
|
|
Although this works, this setup has many drawbacks:</p>
|
|
<ul>
|
|
<li>It requires user intervention (creating a venv, sourcing it, handling new deps). This isn&rsquo;t ideal if you want new (probably inexperienced) users to use your projects.</li>
|
|
<li>On a similar note, the whole process needs to be well documented if you want other users to contribute or maintain the code.</li>
|
|
<li>Pinning dependency versions is finicky, and I&rsquo;ve run into problems beause of that.</li>
|
|
<li>Creating a new project involves a template, or copying files from an older project.</li>
|
|
</ul>
|
|
<p>Of course, this is nothing new.
|
|
There is a whole site dedicated to <a class="link" href="https://packaging.python.org/en/latest/" target="_blank" rel="noopener"
|
|
>packaging your Python project</a>.
|
|
A plethora of different projects have come and go, with varying degrees of success.</p>
|
|
<h2 id="alternatives-poetry">Alternatives (poetry)
|
|
</h2><p>About a year before trying <code>uv</code>, I tried to catch up with the ecosystem and get to know the <code>blessed new way</code>.
|
|
However, the task proved to be a little more difficult, as the landscape is filled with a myriad of alternatives, each with their own set of drawbacks and detractors.
|
|
Packaging has historically been a weak spot, in ironical contradiction to the Zen of Python&rsquo;s &ldquo;There should be one&ndash; and preferably only one &ndash;obvious way to do it&rdquo;,</p>
|
|
<p>I eventually settled on <a class="link" href="https://python-poetry.org/" target="_blank" rel="noopener"
|
|
>poetry</a>.
|
|
Mostly because it seemed like the most popular alternative.</p>
|
|
<p>There are many things I liked about it.
|
|
First of all, having a convention for dependencies (<code>pyproject.toml</code>) and a tool that properly handles them was nice.
|
|
It also removed the need to remember specific incantations to build and publish my Python projects.
|
|
Lastly, I mixed it <code>poetry2nix</code> to create reproducible python environments using nix.
|
|
This makes for a very powerful experience.</p>
|
|
<p>However, there were multiple hiccups.
|
|
First of all, it took me some time to figure out which specific fields to use (each tool can define ad-hoc properties in a the <code>pyproject.toml</code> file), and some of them seemed redundant with the more generic ones.
|
|
Full disclosure, this specific point might be a mistake on my side, and I do not remember the details.
|
|
The second one is speed.
|
|
(Re-)creating an environment took a non-negligible amount of time.</p>
|
|
<h2 id="enter-light-uv">Enter <del>light</del> <code>uv</code>
|
|
</h2><p>According to its repository, <code>uv </code>can replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.
|
|
Not only that, but it also claims to do that 10-100 times faster than pip.
|
|
I must admit that it being written in rust was a another selling point for me, as I&rsquo;m looking for excuses to collaborate in a decently-sized rust projejct.</p>
|
|
<p>Installing it is dead simple: simply download the binary (e.g., with curl) or run <code>pip install uv</code>.
|
|
You won&rsquo;t need much more: <code>uv</code> seems to just do the right thing out of the box.
|
|
And it does it really, really fast.
|
|
The rest of the time it gets out of the way.</p>
|
|
<p>My only gripe so far is that I don&rsquo;t seem to find a built-in command to drop into a shell, but that is nothing that <code>uv run $SHELL</code> cannot fix.</p>
|
|
<h2 id="common-operations">Common operations
|
|
</h2><h3 id="initialize-a-repository">Initialize a repository
|
|
</h3><div class="highlight"><div class="chroma">
|
|
<table class="lntable"><tr><td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code><span class="lnt">1
|
|
</span></code></pre></td>
|
|
<td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">uv init
|
|
</span></span></code></pre></td></tr></table>
|
|
</div>
|
|
</div><h3 id="adding-dependencies">Adding dependencies
|
|
</h3><div class="highlight"><div class="chroma">
|
|
<table class="lntable"><tr><td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code><span class="lnt">1
|
|
</span></code></pre></td>
|
|
<td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">uv add senpy
|
|
</span></span></code></pre></td></tr></table>
|
|
</div>
|
|
</div><h3 id="running-commands-inside-the-environment">Running commands inside the environment
|
|
</h3><div class="highlight"><div class="chroma">
|
|
<table class="lntable"><tr><td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code><span class="lnt">1
|
|
</span><span class="lnt">2
|
|
</span><span class="lnt">3
|
|
</span><span class="lnt">4
|
|
</span></code></pre></td>
|
|
<td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">uv run &lt;COMMAND&gt;
|
|
</span></span><span class="line"><span class="cl">
|
|
</span></span><span class="line"><span class="cl"># e.g., run a shell using your python version and dependencies
|
|
</span></span><span class="line"><span class="cl">uv run $SHELL
|
|
</span></span></code></pre></td></tr></table>
|
|
</div>
|
|
</div><h3 id="dependency-tree">Dependency tree
|
|
</h3><div class="highlight"><div class="chroma">
|
|
<table class="lntable"><tr><td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
|
|
</span><span class="lnt"> 2
|
|
</span><span class="lnt"> 3
|
|
</span><span class="lnt"> 4
|
|
</span><span class="lnt"> 5
|
|
</span><span class="lnt"> 6
|
|
</span><span class="lnt"> 7
|
|
</span><span class="lnt"> 8
|
|
</span><span class="lnt"> 9
|
|
</span><span class="lnt">10
|
|
</span><span class="lnt">11
|
|
</span><span class="lnt">12
|
|
</span><span class="lnt">13
|
|
</span><span class="lnt">14
|
|
</span><span class="lnt">15
|
|
</span><span class="lnt">16
|
|
</span><span class="lnt">17
|
|
</span><span class="lnt">18
|
|
</span><span class="lnt">19
|
|
</span></code></pre></td>
|
|
<td class="lntd">
|
|
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">uv shell
|
|
</span></span><span class="line"><span class="cl">Resolved 44 packages in 1ms
|
|
</span></span><span class="line"><span class="cl">my-project v0.1.0
|
|
</span></span><span class="line"><span class="cl">├── fastapi[standard] v0.115.8
|
|
</span></span><span class="line"><span class="cl">│ ├── pydantic v2.10.6
|
|
</span></span><span class="line"><span class="cl">│ │ ├── annotated-types v0.7.0
|
|
</span></span><span class="line"><span class="cl">│ │ ├── pydantic-core v2.27.2
|
|
</span></span><span class="line"><span class="cl">│ │ │ └── typing-extensions v4.12.2
|
|
</span></span><span class="line"><span class="cl">│ │ └── typing-extensions v4.12.2
|
|
</span></span><span class="line"><span class="cl">│ ├── starlette v0.45.3
|
|
</span></span><span class="line"><span class="cl">│ │ └── anyio v4.8.0
|
|
</span></span><span class="line"><span class="cl">│ │ ├── exceptiongroup v1.2.2
|
|
</span></span><span class="line"><span class="cl">│ │ ├── idna v3.10
|
|
</span></span><span class="line"><span class="cl">│ │ ├── sniffio v1.3.1
|
|
</span></span><span class="line"><span class="cl">│ │ └── typing-extensions v4.12.2
|
|
</span></span><span class="line"><span class="cl">│ ├── typing-extensions v4.12.2
|
|
</span></span><span class="line"><span class="cl">│ ├── email-validator v2.2.0 (extra: standard)
|
|
</span></span><span class="line"><span class="cl">│ │ ├── dnspython v2.7.0
|
|
</span></span><span class="line"><span class="cl">...
|
|
</span></span></code></pre></td></tr></table>
|
|
</div>
|
|
</div></description></item><item><title>Python</title><link>https://balkian.com/cheatsheet/python/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://balkian.com/cheatsheet/python/</guid><description><img src="https://balkian.com/img/python.png" alt="Featured image of post Python" /><h2 id="interesting-libraries">Interesting libraries
|
|
</h2><h3 id="tqdm"><a class="link" href="https://github.com/tqdm/tqdm" target="_blank" rel="noopener"
|
|
>TQDM</a>
|
|
</h3><p>From tqdm&rsquo;s github repository:</p>
|
|
<blockquote>
|
|
<p>tqdm means &ldquo;progress&rdquo; in Arabic (taqadum, تقدّم) and an abbreviation for &ldquo;I love you so much&rdquo; in Spanish (te quiero demasiado).</p></blockquote>
|
|
<p><img src="https://raw.githubusercontent.com/tqdm/tqdm/master/images/tqdm.gif"
|
|
loading="lazy"
|
|
alt="TQDM in action"
|
|
></p>
|
|
<h2 id="tools">Tools
|
|
</h2><h3 id="uv"><a class="link" href="https://github.com/astral-sh/uv" target="_blank" rel="noopener"
|
|
>uv</a>
|
|
</h3><p>🚀 A single tool to replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.
|
|
⚡️ 10-100x faster than pip.</p>
|
|
<ul>
|
|
<li>Provides comprehensive project management, with a universal lockfile.</li>
|
|
<li>Runs scripts, with support for inline dependency metadata.</li>
|
|
<li>Installs and manages Python versions.</li>
|
|
<li>Runs and installs tools published as Python packages.</li>
|
|
<li>Includes a pip-compatible interface for a performance boost with a familiar CLI.</li>
|
|
<li>Supports Cargo-style workspaces for scalable projects.</li>
|
|
<li>Disk-space efficient, with a global cache for dependency deduplication.</li>
|
|
<li>Installable without Rust or Python via curl or pip.</li>
|
|
<li>Supports macOS, Linux, and Windows.</li>
|
|
</ul></description></item></channel></rss> |