1
0
mirror of https://github.com/balkian/balkian.github.com.git synced 2025-04-18 18:59:04 +00:00

242 lines
19 KiB
XML

<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Programming on J. Fernando Sánchez</title><link>https://balkian.com/categories/programming/</link><description>Recent content in Programming on J. Fernando Sánchez</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Wed, 26 Feb 2025 23:22:59 +0100</lastBuildDate><atom:link href="https://balkian.com/categories/programming/index.xml" rel="self" type="application/rss+xml"/><item><title>Bridging RDF, JSON-LD and Dataclasses</title><link>https://balkian.com/p/bridging-rdf-json-ld-and-dataclasses/</link><pubDate>Wed, 26 Feb 2025 23:22:59 +0100</pubDate><guid>https://balkian.com/p/bridging-rdf-json-ld-and-dataclasses/</guid><description>&lt;p>In the RDF world, data is expressed as a collection of triples.
These triples can contain IRIs that may or may not be accessible or valid.
And the use of these IRIs may or may not adhere to a vocabulary.
Checking the validity of the IRIs and the semantics of the triples is an additional step.&lt;/p>
&lt;h2 id="the-rdflib-way">The &lt;code>rdflib&lt;/code> way
&lt;/h2>&lt;p>&lt;code>rdflib&lt;/code> only models IRIs, values and namespaces.
Developers need to be cognisant of the URIs they are using, and the vocabularies being used.
Prior to version 2.0, senpy followed a very similar model.
It had a base class to represent a generic node.
Each instance then gets its own automatically generated id, and will act like a normal dictionary, whose keys and values will be serialized as a JSON-LD dictionary.
Multiple subclasses were also included to model specific types of node, mostly to provide convenience methods for the given subtype.
Here is an example of a subclass, &lt;code>Entity&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt">1
&lt;/span>&lt;span class="lnt">2
&lt;/span>&lt;span class="lnt">3
&lt;/span>&lt;span class="lnt">4
&lt;/span>&lt;span class="lnt">5
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="n">entry&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">Entry&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">entry&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;vocab:property&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="mi">25&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">entry&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">jsonld&lt;/span>&lt;span class="p">())&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;p>Would print something like this:&lt;/p>
&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt">1
&lt;/span>&lt;span class="lnt">2
&lt;/span>&lt;span class="lnt">3
&lt;/span>&lt;span class="lnt">4
&lt;/span>&lt;span class="lnt">5
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-json" data-lang="json">&lt;span class="line">&lt;span class="cl">&lt;span class="p">{&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;@id&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;:Entry_202505....&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;@type&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="s2">&amp;#34;prefix:Entity&amp;#34;&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nt">&amp;#34;vocab:property&amp;#34;&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="mi">25&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="p">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;p>Producing correct triples using this model requires using the vocabularies and URIs properly, with little to no tooling to enforce it.
This poses a big problem for a tool like Senpy, which aims to make it easier for professionals without a background in RDF to build and consume semantic NLP ser
If an attribute is not a URI and is not included in the global JSON-LD context, it will not generate a triple in the final graph.
Moreover, there is way to enforce that the vocabularies and the&lt;/p>
&lt;p>Pros:&lt;/p>
&lt;ul>
&lt;li>Flexible/extensible&lt;/li>
&lt;li>Lightweight. This is mostly JSON-LD in Python&amp;rsquo;s clothing.&lt;/li>
&lt;li>Naturally maps to both &lt;code>rdflib&lt;/code> and writing &lt;code>json-ld&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Cons:&lt;/p>
&lt;ul>
&lt;li>Discoverability. Documentation and examples are needed to know which attributes to use&lt;/li>
&lt;li>Error-prone. It is easy to misuse a property, or introduce typos&lt;/li>
&lt;li>Tight coupling with semantics/RDF. One needs to know a thing or two about RDF, especially if new vocabularies or annotations need to be used.&lt;/li>
&lt;/ul>
&lt;h2 id="the-object-oriented-way">The object-oriented way
&lt;/h2>&lt;p>An obvious alternative to this problem in an object-oriented language like python is to use classes to represent our data model.
These classes can define the specific attributes available, and typing annotations can serve both as a guide for the developer, and as a means to automatically
validate objects at runtime.
There are tools like &lt;a class="link" href="https://pydantic.dev/" target="_blank" rel="noopener"
>pydantic&lt;/a> that make this process very simple.
Then, we only need to define how your models should be serialized into JSON-LD.
We can thoroughly test this serialization to ensure that the resulting object is correct and produces the right RDF graph.
Going back to our previous example, we could define an Entry class as a dataclass, and define all the possible types of annotations as attributes.&lt;/p>
&lt;p>This model works great when all the possible attributes are known ahead of time.
But it starts to break when the model provided is not comprehensive enough, or customers of your library need to provide their own ad-hoc annotations / attribut
es.
This could be solved by encouring consumers of our library to define their own subclasses whenever they need to add new attributes.
This works perfectly fine for serialization, but it breaks if your library needs to automatically deserialize these subclasses.
It also breaks if different parts of the code need to add their own attributes on the same data at the same time.
This was precisely the case of &lt;code>senpy&lt;/code>, where entities are annotated by different plugins, each providing a different set of annotations.&lt;/p>
&lt;p>Pros:&lt;/p>
&lt;ul>
&lt;li>Discoverability. All possible attributes are known ahead of time, including their possible types.&lt;/li>
&lt;li>Decoupling from RDF. Developers only need to know about the dataclasses provided. The mapping to the RDF world is already encoded in the dataclass.&lt;/li>
&lt;/ul>
&lt;p>Cons:&lt;/p>
&lt;ul>
&lt;li>Rigidity. Adding new types of annotations requires modifying the models, in the main module.&lt;/li>
&lt;li>Polymorphism.&lt;/li>
&lt;/ul>
&lt;h2 id="a-hybrid-approach">A hybrid approach
&lt;/h2>&lt;p>Whichever solution is chosen in the end, it needs to:&lt;/p>
&lt;ul>
&lt;li>Make it easy and error-proof to add the most common types of annotations&lt;/li>
&lt;li>Allow for additional annotations/attributes to be added&lt;/li>
&lt;li>Allow for upgrades in the future. i.e., converting the most common custom annotations into built-in ones&lt;/li>
&lt;li>Allow for deserialization of custom types&lt;/li>
&lt;li>Allow multiple consumers to add their own annotations&lt;/li>
&lt;/ul></description></item><item><title>uv - One rust tool to rule all pythons</title><link>https://balkian.com/p/uv-one-rust-tool-to-rule-all-pythons/</link><pubDate>Mon, 17 Feb 2025 23:02:47 +0100</pubDate><guid>https://balkian.com/p/uv-one-rust-tool-to-rule-all-pythons/</guid><description>&lt;img src="https://balkian.com/img/uv.png" alt="Featured image of post uv - One rust tool to rule all pythons" />&lt;p>Long story short: I&amp;rsquo;m now using &lt;a class="link" href="https://github.com/astral-sh/uv" target="_blank" rel="noopener"
>uv&lt;/a>, and so should you.
It is a great replacement for pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.&lt;/p>
&lt;h2 id="context">Context
&lt;/h2>&lt;p>For years, my strategy to manage python projects has been a mix of a custom &lt;code>setup.py&lt;/code>, several hand-crafted &lt;code>requirements.txt&lt;/code> files (through &lt;code>pip freeze&lt;/code>), a custom virtualenv per project, and multiple tools to upload to PyPI.
Although this works, this setup has many drawbacks:&lt;/p>
&lt;ul>
&lt;li>It requires user intervention (creating a venv, sourcing it, handling new deps). This isn&amp;rsquo;t ideal if you want new (probably inexperienced) users to use your projects.&lt;/li>
&lt;li>On a similar note, the whole process needs to be well documented if you want other users to contribute or maintain the code.&lt;/li>
&lt;li>Pinning dependency versions is finicky, and I&amp;rsquo;ve run into problems beause of that.&lt;/li>
&lt;li>Creating a new project involves a template, or copying files from an older project.&lt;/li>
&lt;/ul>
&lt;p>Of course, this is nothing new.
There is a whole site dedicated to &lt;a class="link" href="https://packaging.python.org/en/latest/" target="_blank" rel="noopener"
>packaging your Python project&lt;/a>.
A plethora of different projects have come and go, with varying degrees of success.&lt;/p>
&lt;h2 id="alternatives-poetry">Alternatives (poetry)
&lt;/h2>&lt;p>About a year before trying &lt;code>uv&lt;/code>, I tried to catch up with the ecosystem and get to know the &lt;code>blessed new way&lt;/code>.
However, the task proved to be a little more difficult, as the landscape is filled with a myriad of alternatives, each with their own set of drawbacks and detractors.
Packaging has historically been a weak spot, in ironical contradiction to the Zen of Python&amp;rsquo;s &amp;ldquo;There should be one&amp;ndash; and preferably only one &amp;ndash;obvious way to do it&amp;rdquo;,&lt;/p>
&lt;p>I eventually settled on &lt;a class="link" href="https://python-poetry.org/" target="_blank" rel="noopener"
>poetry&lt;/a>.
Mostly because it seemed like the most popular alternative.&lt;/p>
&lt;p>There are many things I liked about it.
First of all, having a convention for dependencies (&lt;code>pyproject.toml&lt;/code>) and a tool that properly handles them was nice.
It also removed the need to remember specific incantations to build and publish my Python projects.
Lastly, I mixed it &lt;code>poetry2nix&lt;/code> to create reproducible python environments using nix.
This makes for a very powerful experience.&lt;/p>
&lt;p>However, there were multiple hiccups.
First of all, it took me some time to figure out which specific fields to use (each tool can define ad-hoc properties in a the &lt;code>pyproject.toml&lt;/code> file), and some of them seemed redundant with the more generic ones.
Full disclosure, this specific point might be a mistake on my side, and I do not remember the details.
The second one is speed.
(Re-)creating an environment took a non-negligible amount of time.&lt;/p>
&lt;h2 id="enter-light-uv">Enter &lt;del>light&lt;/del> &lt;code>uv&lt;/code>
&lt;/h2>&lt;p>According to its repository, &lt;code>uv &lt;/code>can replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.
Not only that, but it also claims to do that 10-100 times faster than pip.
I must admit that it being written in rust was a another selling point for me, as I&amp;rsquo;m looking for excuses to collaborate in a decently-sized rust projejct.&lt;/p>
&lt;p>Installing it is dead simple: simply download the binary (e.g., with curl) or run &lt;code>pip install uv&lt;/code>.
You won&amp;rsquo;t need much more: &lt;code>uv&lt;/code> seems to just do the right thing out of the box.
And it does it really, really fast.
The rest of the time it gets out of the way.&lt;/p>
&lt;p>My only gripe so far is that I don&amp;rsquo;t seem to find a built-in command to drop into a shell, but that is nothing that &lt;code>uv run $SHELL&lt;/code> cannot fix.&lt;/p>
&lt;h2 id="common-operations">Common operations
&lt;/h2>&lt;h3 id="initialize-a-repository">Initialize a repository
&lt;/h3>&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt">1
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">uv init
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;h3 id="adding-dependencies">Adding dependencies
&lt;/h3>&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt">1
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">uv add senpy
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;h3 id="running-commands-inside-the-environment">Running commands inside the environment
&lt;/h3>&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt">1
&lt;/span>&lt;span class="lnt">2
&lt;/span>&lt;span class="lnt">3
&lt;/span>&lt;span class="lnt">4
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">uv run &amp;lt;COMMAND&amp;gt;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"># e.g., run a shell using your python version and dependencies
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">uv run $SHELL
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;h3 id="dependency-tree">Dependency tree
&lt;/h3>&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt"> 1
&lt;/span>&lt;span class="lnt"> 2
&lt;/span>&lt;span class="lnt"> 3
&lt;/span>&lt;span class="lnt"> 4
&lt;/span>&lt;span class="lnt"> 5
&lt;/span>&lt;span class="lnt"> 6
&lt;/span>&lt;span class="lnt"> 7
&lt;/span>&lt;span class="lnt"> 8
&lt;/span>&lt;span class="lnt"> 9
&lt;/span>&lt;span class="lnt">10
&lt;/span>&lt;span class="lnt">11
&lt;/span>&lt;span class="lnt">12
&lt;/span>&lt;span class="lnt">13
&lt;/span>&lt;span class="lnt">14
&lt;/span>&lt;span class="lnt">15
&lt;/span>&lt;span class="lnt">16
&lt;/span>&lt;span class="lnt">17
&lt;/span>&lt;span class="lnt">18
&lt;/span>&lt;span class="lnt">19
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">uv shell
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">Resolved 44 packages in 1ms
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">my-project v0.1.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">├── fastapi[standard] v0.115.8
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── pydantic v2.10.6
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ ├── annotated-types v0.7.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ ├── pydantic-core v2.27.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ │ └── typing-extensions v4.12.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ └── typing-extensions v4.12.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── starlette v0.45.3
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ └── anyio v4.8.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ ├── exceptiongroup v1.2.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ ├── idna v3.10
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ ├── sniffio v1.3.1
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ └── typing-extensions v4.12.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── typing-extensions v4.12.2
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ ├── email-validator v2.2.0 (extra: standard)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">│ │ ├── dnspython v2.7.0
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div></description></item><item><title>Python</title><link>https://balkian.com/cheatsheet/python/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://balkian.com/cheatsheet/python/</guid><description>&lt;img src="https://balkian.com/img/python.png" alt="Featured image of post Python" />&lt;h2 id="interesting-libraries">Interesting libraries
&lt;/h2>&lt;h3 id="tqdm">&lt;a class="link" href="https://github.com/tqdm/tqdm" target="_blank" rel="noopener"
>TQDM&lt;/a>
&lt;/h3>&lt;p>From tqdm&amp;rsquo;s github repository:&lt;/p>
&lt;blockquote>
&lt;p>tqdm means &amp;ldquo;progress&amp;rdquo; in Arabic (taqadum, تقدّم) and an abbreviation for &amp;ldquo;I love you so much&amp;rdquo; in Spanish (te quiero demasiado).&lt;/p>&lt;/blockquote>
&lt;p>&lt;img src="https://raw.githubusercontent.com/tqdm/tqdm/master/images/tqdm.gif"
loading="lazy"
alt="TQDM in action"
>&lt;/p>
&lt;h2 id="tools">Tools
&lt;/h2>&lt;h3 id="uv">&lt;a class="link" href="https://github.com/astral-sh/uv" target="_blank" rel="noopener"
>uv&lt;/a>
&lt;/h3>&lt;p>🚀 A single tool to replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.
⚡️ 10-100x faster than pip.&lt;/p>
&lt;ul>
&lt;li>Provides comprehensive project management, with a universal lockfile.&lt;/li>
&lt;li>Runs scripts, with support for inline dependency metadata.&lt;/li>
&lt;li>Installs and manages Python versions.&lt;/li>
&lt;li>Runs and installs tools published as Python packages.&lt;/li>
&lt;li>Includes a pip-compatible interface for a performance boost with a familiar CLI.&lt;/li>
&lt;li>Supports Cargo-style workspaces for scalable projects.&lt;/li>
&lt;li>Disk-space efficient, with a global cache for dependency deduplication.&lt;/li>
&lt;li>Installable without Rust or Python via curl or pip.&lt;/li>
&lt;li>Supports macOS, Linux, and Windows.&lt;/li>
&lt;/ul></description></item></channel></rss>