balkian.github.com/search/index.json


			
				
					
						
						
						
							
							
								
							
							[{"content":"A big part of my research has been around vocabularies and semantic annotation. And, to be honest, I\u0026rsquo;ve grown increasingly dissatisfied with the field. To the point where I dread having to work on it. Some day I will write about it in length, but today I\u0026rsquo;ve stumbled upon a post that covers the topic quite well: The Semantic Web is Dead - Long Live the Semantic Web (styling mine).\nIn particular, this section has really resonated with me:\nAcademics and Industry The political economy of academia and its interaction with industry is the origin of our current lack of a functional Semantic Web.\nAcademia is structured in a way that there is very little incentive for anyone to build usable software. Instead, you are elevated for rapidly throwing together an idea, a tiny proof of concept, and to iterate on microscopic variations of this thing to produce as many papers as possible.\nIn engineering, the devil is in the detail. You really need to get into the weeds before you can know what the right thing to do is. This is simultaneously a devastating situation for industry and academia. Nobody is going to wait around for a team of engineers to finish building a system to write about it in Academia. You’ll be passed immediately by legions of paper pushers. And in industry, you can’t just be mucking about with a system that you might have to throw away.\nTags: [semantic web]","date":"2025-03-07T10:24:52+01:00","permalink":"https://balkian.com/p/rdf-is-dead/","title":"RDF Is Dead"},{"content":"Background TL;DR I work in academia. This post focuses on advice I\u0026rsquo;d give a younger me to be a more effective supervisor and project lead.\nMy role in my research group has evolved from individual contributor to project lead that manages a team of multiple students. This often involves coordinatating with other senior researchers and their teams.\nThis post is a collection of advice I would have given myself back when I started this journey. It is also an excuse to reflect on these ideas I\u0026rsquo;ve been implicitly applying everyday, and maybe learn a few things more in the process.\nIn my field, projects are often tied to a specific grant or some sort of public funding. This means that the main concern of the lead is to ensure that the results at the end of the project match the description in the project proposal. It also typically means maximizing the number of publications related to the project and their overall impact.\nTo do so, most projects rely on three types of staff: a) senior researchers (post-doc); b) junior researchers (PhD students); and c) interns doing their bachelor\u0026rsquo;s or master\u0026rsquo;s thesis. The level of contribution is generally inversely propotional to the level of mastery of the contributor: PhD students design and develop the main contributions (both software and experimental) under the supervision of senior researchers (advisor/supervisor), and interns take care of tasks that are narrow in scope and not crucial to any academic contribution. For instance, a PhD student may develop a new model for text classification, and an intern will wrap that model in an HTTP service with a nice UI. When the service and UI part is intricate and has some potential academic merit, that task may be conducted by a PhD student as part of their thesis. That was precisely the case with Senpy, which was part of my PhD thesis, and it has since been used by dozens of students to develop services in the context of other research projects.\nReasons to form a team In my opinion, a team has two advantages over a single contributor. The first one is that collaboration often generates synergies, leading to surprising and enriching results (a team is greater than the sum of its parts) Carefully selecting your team members and creating an environment that is conducing to these synergies is a topic on its own, and I will not get too deep into it here.\nThe second advantage can be summarized as concurrency: tasks can be tackled by more than one member. This often implies some sort of parallelism. Tasks tend to be split between different members, in hopes of speeding up the process. But it can also be beneficial to have concurrency even in the absence of parallelism: different members can take turns solving the same problem. This is common when the task requires exclusive access to a resource (e.g., performing an experiment on an expensive machine).\nChallenges of teams (of students) Just like in computer science, coordination in a concurrent project involves a non-negligible overhead. The objective is to minimize this overhead.\nI\u0026rsquo;ve really struggled with managing teams in the past. Given my context, I often attributed the failures to lack of time (having to juggle teaching and research), lack of training and preparation on the intern\u0026rsquo;s side (they\u0026rsquo;re often undergrads), lack of appeal or definition of the task (too academic-y), or any other external factors.\nWhile all those aspects play an important role, some of them are out of our control as a project lead in academia: grants require a certain type of project and workflow, and the quality of our interns is bounded by the quality of our degrees. So I think it is more constructive to focus on things that we can control. In other words: we have to play the hand we\u0026rsquo;re given.\nBesides, there is no merit in achieving good results with excellent students/engineers. They would succeed on their own even if you weren\u0026rsquo;t there. The real test for a good leader is succeeding with a subpar team.\nIn that vein, I\u0026rsquo;ve reflected on my mistakes as a leader, and the inefficiencies of the teams around me. I\u0026rsquo;d classify my failures in the following areas:\nDelegation (and lack thereof). Piling up too many tasks and blocking progress. Communication. Not having a coherent view of the state of affairs, the details of specific tasks, the general processes to follow, or the priorities of different tasks within a project. Direction (or purpose). Not having a common direction Delegation I have a tendency to become a bottleneck in any project I am involved in: many tasks end up depending on me, either directly or indirectly. In the concurrency metaphor, I become a lock for many tasks, and a single executor for the rest. This issue was not that apparent early on, when most of my work was as an individual contributor or I had the bandwidth to supervise and complete my tasks.\nI think it is quite common to feel like delegating a task in these scenarios means:\nDefining the task in advance Choosing an assignee for the task Setting a deadline for the task Explaining the task and the relevant context Replying to multiple questions by e-mail or in person. Extra points when the questions make you wonder if any part of the explanation was ever clear. Reviewing the results after the deadline Realizing the assignee misunderstood the task or delivered something not even close to what you agreed upon Going back to point 3. When you\u0026rsquo;re unlucky or short on time: giving up and doing the task yourself Many times, if felt like delegating tasks only lead to frustration and wasted time. Especially when compared to the alternative:\nDefining the task Setting a deadline Finishing the task Profit Luckily, some students and projects were an exception to this. They worked autonomously and delivered something beyond the minimum requirements. This reinforced my helplessness and the feeling that the problem was not being able to work with experienced engineers.\nHowever, I now believe that the truth lies somewhere in between. Sometimes your circumstances make it quite hard or inefficient to delegate tasks. And some times may not be good candidates for delegation. But most of the times you can take advantage of having an extra pair of hands, you just have to do that effectively.\nCommunication Small teams rely on implicit knowledge more than they realize. Even more so if the team is made up of highly specialized people that have worked in the same environment with mostly the same people for years.\nCommunication is a broad term. It includes technical and concrete things such as how a certain task should be done. But it also includes broader things like etiquette, organizational values, and who is more willing to help you out on certain topics on a Friday afternoon.\nHere, I would take a page out of Python\u0026rsquo;s zen and recommend that \u0026ldquo;explicit is better than implicit\u0026rdquo;. Implicit (or tacit) knowledge comes with a whole set of drawbacks:\nIt makes onboarding new users harder. Without a common knowledge base, all knowledge transference has to rely on personal interactions. Even worse, those interactions are probably organized on the spot, and are likely to miss important points. It makes you heavily reliant on your current members (and their memory). It impedes proper evaluationn and progress, since they are not written anywhere. It increases the likelihood of misunderstandings when two members have conflicting beliefs, and makes it harder to detect them until it is too late. It makes contradictions and (unknowingly) changing your mind much more likely. It can happen to the best of us, especially if you are involved in too many projects. When contraditions happen often, your colleagues will learn not to rely on your opinion. On the other hand, communication has to go both ways. This means that your newer members need to be able to communicate when something is going wrong or can be improved (backpressure). They should also feel free to talk about their motivation, state of mind, and feelings, when appropriate. That last part is quite subjective, of course. Try to find your - and your organization\u0026rsquo;s - middle ground between \u0026ldquo;I don\u0026rsquo;t care how you feel, just do your job\u0026rdquo; and \u0026ldquo;sure, you can go to the Maldives on short notice. Oh, and don\u0026rsquo;t worry about not having met a deadline in months, I\u0026rsquo;m sure you\u0026rsquo;re stressed and can use some vacation but will work remotely if we need you\u0026rdquo;.\nI personally feel a sweet spot is treating your coworkers like people, being empathetic and compassionate. Part of being a good coworker is fulfilling the duties and obligations you accepted when signing your contract. First and foremost, because not fulfilling them means someone else will have to work harder to make up for it. And, secondly, because doing our part is the only way to move the organization (and research) forward.\nDirection By failure in direction I mean not keeping a consistent and shared set of general goals, principles and values in your organization. In order to really take part in any enterprise, you need to have a clear understanding of the objectives and motivation. When it comes to specific tasks, what you\u0026rsquo;re doing is often not as important as the why you\u0026rsquo;re doing it. In fact, there may be times where you aren\u0026rsquo;t truly sure of what exactly it is that you are doing, but you trust the process and the motivation behind the task.\nI\u0026rsquo;ve seen two failure modes in this regard. The first one is to not have a clear direction. The end result is that members of the team are not really that committed. If no other why is provided, we are only left with because they pay me to do it. And academia is not known to pay particularly well, to be honest.\nThe other mode is to provide contradicting or incompatible directions. This can be in a short period of time, leading to the impression that there isn\u0026rsquo;t really any conviction in the message. But it can also be done over a longer period of time. That can be perfectly acceptable, provided that the change in direction is justified and compatible with the principles of the organization.\nFailure in direction is somewhat related to communication, but it is subtly different. An organization can excel at communication, but change their direction constantly. Arguably, a thorough communication strategy makes radical changes in direction less likely. On the one hand, a change in direction needs to be documented, which can be a pain. On the other hand, a written change is easier to spot and more likely to generate complaints.\nRules I\u0026rsquo;d argue that the path to successfully managing a research team lies in roughly the following key goals:\nFostering autonomy Avoiding miscommunication Optimizing your contribution The remaining of the post will be a series of tasks or rules to achieve these goals.\nMost of these ideas probably generalize well to collaboration outside of academia, but I hesitate to make more general claims.\nFostering automony The tips here are aimed at avoiding supervision overhead and training future leads.\nProvide a (simplified version of the) bigger picture Try to paint the bigger picture, even for menial tasks within large projects. For you, this may be the nth project you\u0026rsquo;re involved in this year, but the new intern may not have even heard about European projects before. Going back to the idea of direction, it is easier to work on something if you know the context of your work.\nHaving a general idea of the project and the context of your task will also help you make decisions on your own. For instance, if I am told to develop a shiny new API for text classification, I may have to ask many questions: 1) what will be input look like?; 2) what should the parameters be?; 3) am I using POST or GET requests?; 4) should I return a JSON object or an XML?\u0026hellip; What if, instead of that, I am also told this API will be used in the context of project X, that our organization will be the only consumers of the API, and they also give me a link to the project\u0026rsquo;s docs. I may be able to figure out some of those answers on my own (e.g., by finding examples in the project\u0026rsquo;s website), or decide that some questions are not vital at this point (e.g., if we are the only consumers, we can change from GET to POST if we need to much more easily).\nOne caveat here is that a link to the documentation or some vague words about the project do not constitute proper context. You are responsible for summarizing the important bits of the context, providing instructions on how to navigate the reference materials, and being open to answer questions that may arise in the exchange.\nDo not discuss implementation details unless strictly necessary There is a fine line between discussing a non-trivial implementation detail and bikeshedding for hours about class names and code best practices. For that reason, you should try to prioritize discussing high-level parts of the problem and the assignment, and trust the student to figure out the details on their own, or come back to you for clarification.\nIt is very common that students focus on very specific details when they are sharing their progress with their supervisors. They will generally try to start by showing snippets of code and their results. I find it helpful to remind them to explain their problems top-to-bottom, starting with a sentence or two about the context of their project, the description and motivation of the specific task they were doing, and the relationship with previous (and future) tasks. That usually helps figure out the level of understanding of the student, whether there are any conceptual errors, and whether the specific block or problem is really worth discussing during the meeting.\nSome technical problems will warrant a discussion in detail, either due to their complexity or their importance to the project. In those cases, always limit the time you will spend on that specific issue ahead of time, and make sure to allow for some time at the end of the meeting to go back to any important high-level details.\nIf there are other students that worked on similar projects, do not hesitate to refer your new student to them. It can be an opportunity for them to collaborate, and for the original student to work on explaining and teaching technical issues.\nProvide feedback Make a point of evaluating the results of each student on every level, and provide constructive and actionable feedback to them. Even if no technical issues arise during the project, try to review the code and give some tips (e.g., formatting, code structure, DRY). Try to focus on bigger issues and enforcing best practices before nitpicking and giving feedback on small subjective improvements.\nMake it clear when your feedback is objective/best practice (e.g., a function is deprecated) and when it is a matter of preference. If it is the latter, try to provide more than one alternative, to encourage them to think about it and make an educated decision.\nTake documentation and knowledge transfer seriously Taking the time to write down basic documentation can save a lot of time in the long run. Besides, most of the job of mentoring a new student is lost when that student finishes their degree and leaves to find a job in industry. Good documentation can remain in your organization and be extended long after the intern is gone.\nThis is very obvious for specific tools, whether internal or public. Good documentation means any new member can check the tool and use it without much assistance. Even better documentation helps newcomers contribute to the tool. Make sure to make it clear who to approach if something is missing from the documentation, and make it easier to do so than to make assumptions and use the tool incorrectly.\nWriting documentation can be very time consuming, and sometimes it is hard to know exactly what things to focus on when writing the docs. You need to anticipate the needs of the future user. If you are short on time, a good strategy is to delegate the writing of the documentation. Instead of going into details, you can write a very barebones version and training a new user to use and contribute to your tool. Then, leave it up to the new user to extend the documentation, including more details and pitfalls. As a bonus, reading and fixing the docs will give you a better sense of how well that new user understands the tool, as well as possible improvements.\nThis tip also applies to more general areas such as machine learning, graph neural networks, or simulation. Just remember you do not need to reinvent the wheel in those cases. A simple summary and a list of references to expand on the topic could be more than enough. Make sure to also include any specifics that apply to your organization. For instance, point to repositories on github (public or private) that can be used to explore the topic, examples of similar projects in the domain, etc.\nIdentify what information is important for any new hire and present it to them as clearly as possible. Part of that information should be where and how common knowledge is stored and shared, should they need more information in the future. Make this documentation as easy to discover and consume as possible. Centralizing this common information in the form of a wiki is often a good idea.\nLastly, make it easy for any member of your organization to update this common documentation, and encourage them to do so. Whenever a member asks you something useful that is not documented, don\u0026rsquo;t just answer the question. Take the time to add this information yourself (e.g., by copy-pasting your response) or task that member with expanding the documentation themselves once they find an answer. If your organization\u0026rsquo;s culture does not encourage using these docs, they will quickly get outdated and fall out of use.\nOne example of taking this documentation approach really seriously is Oxide (computer company). They have a process they call Request For Discussion (RFD), which they use to discuss and document both technical and organizational decisions. For instance, they have RFDs on why they record every meeting, RFDs about their choice of database, and even an meta-RFD that discusses the motivation RFDs and how the process should work.\nTrust your teammate\u0026rsquo;s ability to learn I\u0026rsquo;ve been bitten by this way too many times. Your students are probably more capable of learning than you think, especially if you have set up your documentation right. What they lack in experience, they make up for with free time, a (more) neuroplasticity and determination.\nSure, they will make mistakes (see the next section) and need some feedback (two sections above), but that is how we all learnt.\nUse tools wisely Your students probably have little experience with code versioning, reviewing processes, time management, etc. A good choice of tools and some training can go a long way and make your life much easier in the long run. It will also give your students a taste of what working in a bigger/real company feels like and a head start.\nFor instance, using git makes it easier to collaborate on code. It also ensures that your results will not be lost if your student\u0026rsquo;s laptop gets stolen.\nUsing GitLab CI or GitHub Actions to deploy public services will provide more autonomy to your students. It will force them to commit working code, and it will make it easier to check their results and discuss the end result.\nUsing overleaf for theses has most of the advantages for collaboration as something like google docs, while being much more flexible and easier to produce formatting results. You may also use something like latex on a shared folder (e.g., nextcloud), although the chances of connflicts is higher, so be careful with documents that require live collaboration. In both cases, make sure to make the getting started experience as simple as possible: provide a sensible template, and only focus on simple features at first.\nAlso, on a related note, make sure every team member has a proper development setup. It does not matter which tool they use (VSCode, emacs, Jetbrains), as long as they are comfortable with it and they are able to focus on actual work. It helps to have a sensible default for your organization that is easy to set up and use, especially because most students do not have enough experience or skill with any particular tool.\nEncourage cooperation Do not become the center of every conversation. If a topic can be discussed between two students, let them handle it on their own and get back to you if they need anything.\nThe ability to discuss with your peers and report only when needed will be extremely important for them in the future. They are also likely to discuss the topic more openly and more relaxed thhan with you (no matter how approachable you are). That might lead to valuable insights and improvements for your team and project.\nMoreover, this attitude of open collaboration will help create those synergies we mentioned before, and make future projects easier and more enjoyable.\nReward proactivity The whole point of this section is to get your team to work independently when possible. Be explicit about this goal to make sure it is clear to everyone. And encourage behavior that aligns with this goal, even on a small scale.\nFor instance, show interest when a student has shown initiative and researched something on their own, or when they go beyond the minimum requirements. Sometimes, you will notice that this research was not completely well oriented or it was not a very efficient use of time. Do not jump straight to criticize it. Compliment the attitude regardless, try to find the value in the results, and be gentle when providing feedback on why other topics or tasks were higher priority or a better choice.\nDon\u0026rsquo;t be a perfectionist Perfect is the enemy of done. It is also the enemy of a happy co-worker.\nTry to remember that you are dealing with students, and you were probably no better at their age. Besides, you probably delegated the task beause you did not have any spare time to do it yourself. FIXME is often better than TODO.\nTake the opportunity to provide some feedback and teach them something useful. Some mistakes are also worth adding to your documentation, or presenting to other students in a presentation.\nAvoiding miscommunication A common source of wasted effort and unnecessary back-and-forth is miscommunication. These are some tips to help keep everyone on the team informed and aligned.\nMake priorities clear All team members should understand the general priorities (project-wise) as well as the specific prorities of their assigned tasks. This will help inform their decisions when some other tasks inevitably come up, or the urgency of a task changes.\nDefine boundaries (and abstractions) Once again, the goal is generally to achieve some sort of parallelism between your team members. In order to do that, they need to know how they will interact with each other.\nOn a more general level, this means knowing the responsibilities and scope of your work.\nOn a more specific level, it means knowing their dependency graph. In other words, whether the progress of one team member will depend on the results of another one. Whenever there is a dependency, the interface should be made very clear. This often takes the form of an API, a file with a given format, or a section of a document.\nTake some time to define the boundary as precisely as needed at that point in the project. I would suggest having specific examples that you can discuss and modify. It is hard to discuss in the abstract, especially for inexperienced contributors. When in doubt, default to the simplest option (e.g., a common file vs using a database). Do not dwell too much on specific structural/representation details (e.g., which OWL vocabulary to use), but make sure that all the necessary bits are there. Converting a document or querying a document store (e.g., elasticsearch) instead of your file system is relatively easy, but making up non-existing data can be a challenge.\nOne type of failure I\u0026rsquo;ve seen quite frequently in this area is to be too fuzzy about the expected results from a team (or contributor), and refusing to discuss or provide examples. That tends to result in multiple iterations, each of them not-quite-what-you-wanted, and frustration in both sides.\nBe approachable Did I wrote a whole section about autonomy? Yes. Is the end goal to do more and talk less? Also yes. Thing is, no process is perfect, and misunderstandings are bound to happen at some point. If your only response to questions is a grumpy face or a \u0026ldquo;read the freaking docs\u0026rdquo;, your students will not alert you when something really needs your attention, and you will find out too late. For instance, the documentation may be unclear, or your processes may be inadvertedly alienating new members and making new hires harder.\nAnother way to be approachable is to be clear about your shortcomings, and whether something you are saying is negotiable and/or debatable. My rule of thumb is to err on the side of negotiable, and only be strict when it is really necessary (e.g., time constraints or an unproductive student). We are all more likely to finish our tasks if we feel them ours, if we a say in how and when to perform them.\nJust to be clear, approachable does not mean you have to be their confident or their best friend. It also does not mean that it is okay to challenge or question you continuously. Some times it is okay to simply say \u0026ldquo;just do as I say\u0026rdquo;.\nReview frequently One type of review is individual. It involves reviewing code on github, or reading deliverables and papers on overleaf. It can help catch misunderstandings, and measure the true rate of progress in the individual tasks. The other type of review is done as a group, by going through the key progress and action points. This type of review helps everyone stay on the same page, and catch any general drifts in the project, such as misaligned priorities.\nThe frequency of each type of review depends on the specific nature of the project, the types of tasks being performed by the student, and your confidence on the student\u0026rsquo;s abilities.\nOptimizing your contribution Tips on optimizing your contribution to the team.\nPrioritize, prioritize, prioritize Part of your job as a project lead is to identify the main goals in a project and to prioritize the tasks that will lead you there. On the other hand, you are part of a research group, and you should be actively involved in its health and future. Lastly, you are also in charge of the life-long project that is your research career.\nIn all these cases, your goal should be to identify the long term goals, come up with a sound strategy, and prioritize the tasks that will lead you and your group there. Keeping your priorities straight will help you make steady progress, and avoid bikeshedding and changing goalposts. It will also help you steer your progress in the right direction, since we all have limited time and effort and can\u0026rsquo;t do everything at once.\nThe fact that your time is limited also means that you will need to decide how to prioritize these three roles. I\u0026rsquo;ve listed them in increasing level of importance for me. It means that it is okay to focus on a specific project for a while, but if progress in your career is stalled - usually through publication - you need to reevaluate and concentrate your efforts on that.\nBe okay with (short-term) inefficiencies I\u0026rsquo;ve personally struggled with delegating tasks that will take me orders of magnitude less work than they will a student. Thing is, most tasks will fall under this category, and your time is limited, so you have to delegate if you want to have time for more important matters. If you never delegate any tasks, you are not allowing your team to learn and catch up on whatever technical skills are required. Besides, you are not improving yourself on the managerial side of things. It turns out delegating is hard, it requires a whole set of non-technical skills. I suspect this is oftentimes the reason we don\u0026rsquo;t do delegate in the first place: delegating is hard, and technical tasks are usually more straightforward, so we just don\u0026rsquo;t want to do the work.\nDon\u0026rsquo;t neglect training You are a senior researcher. You probably know how to solve problems in your domain quite efficiently. In my case, that means processing data and developing code.\nThat means I could dedicate my days to processing data and developing new code for my group. That group would likely be used in multiple projects. However, there is a hard limit to how much code I can push out in a day, especially if you take into account other obligations such as teaching.\nA wiser strategy would be to set aside some of that coding time to instead help students become better programmers. Firstly, because those students will be thankful and more motivated to work than when they are left to learn on their own without much guidance. Secondly, because those students will then be more prepared to help me out if I delegate a task to them. And, lastly, because these students have a whole life in fron of them. A life full of big projects of their own, and contributions to society. That little training time can have a compounding effect in the future.\nSet a time limit for your interactions in advance Really long slots can easily lead to bikeshedding and going unnecessarily deep into implementation details, which is clearly an inefficient use of your time. Even worse, our attention span and memory are finite, so longer and dense meetings can lead to fatigue and to missing or dilluting important points in the conversation.\nFor these reasons, be very clear about these time limits, and do not extend these meetings unless it is strictly necessary. You can always schedule a new meeting, but be sure to provide enough time in between to process the results of the meeting, reflect and prioritize.\nBeyond your team The previous points and rules focus mostly on actions that can be applied within your team, and that you can fully control. But teams rarely work in isolation, you will most likely In order to be effective, you also need to coordinate with other teams/groups, and more generally work on your organization\u0026rsquo;s culture and sense of belonging.\nMany of the aspects I talked about in the team section apply here. For instance, the obsession with documentation can - and should - be applied organization-wise. The same goes for defining boundaries and using concrete examples when collaborating with other teams. For most intents and purposes, you can treat other teams as another contributor to your team. Just one that will be more costly and slow to interact with.\nIf possible, I\u0026rsquo;d try to apply the rule about focusing on the big picture, and limit most meetings to those that strictly need to be involved. Avoid involving whole teams in discussions when the broad strokes have not been defined yet. The responsibilities will be dilluted in a bigger group, it will be harder to avoid misunderstandings and easier to bikeshed.\nOn the organization\u0026rsquo;s side, I would suggest having an honest conversation about your core principles. I really liked Bryan Cantrill\u0026rsquo;s talk about principles of technology leadership. He goes deep into the effects that principles have had on well known companies, and how to go about defining your company\u0026rsquo;s principles. I think that writing down your principles forces you to be conscious about their trade-offs, and to be explicit about your choices and attitudes.\nMore generally, try to define (light) processes that reward and facilitate behaviors you find positive, such as writing documentation and being proactive. And try to discourage the opposite type of behavior as soon as possible, to make correcting them easier. Apply the ideas of frequent evaluation and feedback, openness and honesty in every aspect of your organization.\nTags: [team management]","date":"2025-03-05T09:25:54+01:00","permalink":"https://balkian.com/p/efficient-collaboration/","title":"Tips for efficient collaboration"},{"content":"In the RDF world, data is expressed as a collection of triples. These triples can contain IRIs that may or may not be accessible or valid. And the use of these IRIs may or may not adhere to a vocabulary. Checking the validity of the IRIs and the semantics of the triples is an additional step.\nThe rdflib way rdflib only models IRIs, values and namespaces. Developers need to be cognisant of the URIs they are using, and the vocabularies being used. Prior to version 2.0, senpy followed a very similar model. It had a base class to represent a generic node. Each instance then gets its own automatically generated id, and will act like a normal dictionary, whose keys and values will be serialized as a JSON-LD dictionary. Multiple subclasses were also included to model specific types of node, mostly to provide convenience methods for the given subtype. Here is an example of a subclass, Entity.\n1 2 3 4 5 entry = Entry() entry[\u0026#39;vocab:property\u0026#39;] = 25 print(entry.jsonld()) Would print something like this:\n1 2 3 4 5 { \u0026#34;@id\u0026#34;: \u0026#34;:Entry_202505....\u0026#34;, \u0026#34;@type\u0026#34;: \u0026#34;prefix:Entity\u0026#34;, \u0026#34;vocab:property\u0026#34;: 25 } Producing correct triples using this model requires using the vocabularies and URIs properly, with little to no tooling to enforce it. This poses a big problem for a tool like Senpy, which aims to make it easier for professionals without a background in RDF to build and consume semantic NLP ser If an attribute is not a URI and is not included in the global JSON-LD context, it will not generate a triple in the final graph. Moreover, there is way to enforce that the vocabularies and the\nPros:\nFlexible/extensible Lightweight. This is mostly JSON-LD in Python\u0026rsquo;s clothing. Naturally maps to both rdflib and writing json-ld Cons:\nDiscoverability. Documentation and examples are needed to know which attributes to use Error-prone. It is easy to misuse a property, or introduce typos Tight coupling with semantics/RDF. One needs to know a thing or two about RDF, especially if new vocabularies or annotations need to be used. The object-oriented way An obvious alternative to this problem in an object-oriented language like python is to use classes to represent our data model. These classes can define the specific attributes available, and typing annotations can serve both as a guide for the developer, and as a means to automatically validate objects at runtime. There are tools like pydantic that make this process very simple. Then, we only need to define how your models should be serialized into JSON-LD. We can thoroughly test this serialization to ensure that the resulting object is correct and produces the right RDF graph. Going back to our previous example, we could define an Entry class as a dataclass, and define all the possible types of annotations as attributes.\nThis model works great when all the possible attributes are known ahead of time. But it starts to break when the model provided is not comprehensive enough, or customers of your library need to provide their own ad-hoc annotations / attribut es. This could be solved by encouring consumers of our library to define their own subclasses whenever they need to add new attributes. This works perfectly fine for serialization, but it breaks if your library needs to automatically deserialize these subclasses. It also breaks if different parts of the code need to add their own attributes on the same data at the same time. This was precisely the case of senpy, where entities are annotated by different plugins, each providing a different set of annotations.\nPros:\nDiscoverability. All possible attributes are known ahead of time, including their possible types. Decoupling from RDF. Developers only need to know about the dataclasses provided. The mapping to the RDF world is already encoded in the dataclass. Cons:\nRigidity. Adding new types of annotations requires modifying the models, in the main module. Polymorphism. A hybrid approach Whichever solution is chosen in the end, it needs to:\nMake it easy and error-proof to add the most common types of annotations Allow for additional annotations/attributes to be added Allow for upgrades in the future. i.e., converting the most common custom annotations into built-in ones Allow for deserialization of custom types Allow multiple consumers to add their own annotations Tags: [rdf json-ld pydantic python]","date":"2025-02-26T23:22:59+01:00","permalink":"https://balkian.com/p/bridging-rdf-json-ld-and-dataclasses/","title":"Bridging RDF, JSON-LD and Dataclasses"},{"content":"Long story short: I\u0026rsquo;m now using uv, and so should you. It is a great replacement for pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.\nContext For years, my strategy to manage python projects has been a mix of a custom setup.py, several hand-crafted requirements.txt files (through pip freeze), a custom virtualenv per project, and multiple tools to upload to PyPI. Although this works, this setup has many drawbacks:\nIt requires user intervention (creating a venv, sourcing it, handling new deps). This isn\u0026rsquo;t ideal if you want new (probably inexperienced) users to use your projects. On a similar note, the whole process needs to be well documented if you want other users to contribute or maintain the code. Pinning dependency versions is finicky, and I\u0026rsquo;ve run into problems beause of that. Creating a new project involves a template, or copying files from an older project. Of course, this is nothing new. There is a whole site dedicated to packaging your Python project. A plethora of different projects have come and go, with varying degrees of success.\nAlternatives (poetry) About a year before trying uv, I tried to catch up with the ecosystem and get to know the blessed new way. However, the task proved to be a little more difficult, as the landscape is filled with a myriad of alternatives, each with their own set of drawbacks and detractors. Packaging has historically been a weak spot, in ironical contradiction to the Zen of Python\u0026rsquo;s \u0026ldquo;There should be one\u0026ndash; and preferably only one \u0026ndash;obvious way to do it\u0026rdquo;,\nI eventually settled on poetry. Mostly because it seemed like the most popular alternative.\nThere are many things I liked about it. First of all, having a convention for dependencies (pyproject.toml) and a tool that properly handles them was nice. It also removed the need to remember specific incantations to build and publish my Python projects. Lastly, I mixed it poetry2nix to create reproducible python environments using nix. This makes for a very powerful experience.\nHowever, there were multiple hiccups. First of all, it took me some time to figure out which specific fields to use (each tool can define ad-hoc properties in a the pyproject.toml file), and some of them seemed redundant with the more generic ones. Full disclosure, this specific point might be a mistake on my side, and I do not remember the details. The second one is speed. (Re-)creating an environment took a non-negligible amount of time.\nEnter light uv According to its repository, uv can replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more. Not only that, but it also claims to do that 10-100 times faster than pip. I must admit that it being written in rust was a another selling point for me, as I\u0026rsquo;m looking for excuses to collaborate in a decently-sized rust projejct.\nInstalling it is dead simple: simply download the binary (e.g., with curl) or run pip install uv. You won\u0026rsquo;t need much more: uv seems to just do the right thing out of the box. And it does it really, really fast. The rest of the time it gets out of the way.\nMy only gripe so far is that I don\u0026rsquo;t seem to find a built-in command to drop into a shell, but that is nothing that uv run $SHELL cannot fix.\nCommon operations Initialize a repository 1 uv init Adding dependencies 1 uv add senpy Running commands inside the environment 1 2 3 4 uv run \u0026lt;COMMAND\u0026gt; # e.g., run a shell using your python version and dependencies uv run $SHELL Dependency tree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 uv shell Resolved 44 packages in 1ms my-project v0.1.0 ├── fastapi[standard] v0.115.8 │ ├── pydantic v2.10.6 │ │ ├── annotated-types v0.7.0 │ │ ├── pydantic-core v2.27.2 │ │ │ └── typing-extensions v4.12.2 │ │ └── typing-extensions v4.12.2 │ ├── starlette v0.45.3 │ │ └── anyio v4.8.0 │ │ ├── exceptiongroup v1.2.2 │ │ ├── idna v3.10 │ │ ├── sniffio v1.3.1 │ │ └── typing-extensions v4.12.2 │ ├── typing-extensions v4.12.2 │ ├── email-validator v2.2.0 (extra: standard) │ │ ├── dnspython v2.7.0 ... Tags: [python]","date":"2025-02-17T23:02:47+01:00","image":"https://balkian.com/img/uv.png","permalink":"https://balkian.com/p/uv-one-rust-tool-to-rule-all-pythons/","title":"uv - One rust tool to rule all pythons"},{"content":"This is a quick and easy recipe to add a default.nix to any Python project with a requirements.txt file:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 with import \u0026lt;nixpkgs\u0026gt; { }; let pythonPackages = python311Packages; in pkgs.mkShell rec { name = \u0026#34;impurePythonEnv\u0026#34;; venvDir = \u0026#34;./.venv\u0026#34;; buildInputs = [ # A python interpreter including the \u0026#39;venv\u0026#39; module is required to bootstrap # the environment. pythonPackages.python # This execute some shell code to initialize a venv in $venvDir before # dropping into the shell pythonPackages.venvShellHook # Those are dependencies that we would like to use from nixpkgs, which will # add them to PYTHONPATH and thus make them accessible from within the venv. pythonPackages.numpy pythonPackages.requests # In this particular example, in order to compile any binary extensions they may # require, the python modules listed in the hypothetical requirements.txt need # the following packages to be installed locally: taglib openssl git libxml2 libxslt libzip zlib ]; # Now we can execute any commands within the virtual environment. # This is optional and can be left out to run pip manually. postShellHook = \u0026#39;\u0026#39; pip install -r requirements.txt \u0026#39;\u0026#39;; } Now, you will get a clean environment by running:\n1 nix-shell Tags: [nix python]","date":"2023-11-13T18:21:46+01:00","permalink":"https://balkian.com/p/nix-recipe-for-python-projects/","title":"Nix Recipe for Python Projects"},{"content":"Kanata is a software keyboard remapper that aims to improve keyboard comfort and usability with advanced customization. Keyboard remappers are a good alternative to running a custom keyboard with QMK/ZMK, and have two main advantages: they work on any keyboard, and you can configure them to launch any command or program you want, not just key presses. On the other hand, you need to configure them on every PC/OS you\u0026rsquo;re using your keyboard with, and all the processing is done on software on top of the OS, so there may be glitches and performance issues.\nThe project was inspired by the more popular KMonad, and the author cites some of the differences. Both projects use a very similar configuration format based on lisp. The configuration consists of a set of general options, a base key configuration, a series of layers, and macros that can be used within those layers. Here\u0026rsquo;s a very complete config that serves as documentation.\nOne big disadvantage of the lispy configuration is that you need to specify your hardware layout/all your keys, and repeat that every time you define a new layer. The result visually maps to your keyboard, but can be very verbose/big if you need really few changes.\nKeyd is another alternative with a more declarative configuration format, which might lend itself to smaller.\nFor now I\u0026rsquo;m just trying it out, and getting a feel for using fewer keys before I build my own ZMK keyboard. I particularly like the option of using mod-keys on the home row (e.g., having A work as a CTRL when held). Mod-tap, tap-dancing and the like are very common techniques in sub-40% layouts, where there simply aren\u0026rsquo;t enough keys for all the letters and symbols. In a regular-sized keyboard, these techniques can also help you stay on the home row and type more comfortably. At least, that\u0026rsquo;s the idea. We\u0026rsquo;ll see if I like it enough to stick with it.\nFor now, here\u0026rsquo;s my very simple config:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 (defcfg ;; Your keyboard device will likely differ from this. linux-dev /dev/input/by-id/usb-Logitech_USB_Receiver-if02-event-mouse ;; Windows doesn\u0026#39;t need any input/output configuration entries; however, there ;; must still be a defcfg entry. You can keep the linux-dev entry or delete ;; it and leave it empty. ) (defsrc grv 1 2 3 4 5 6 7 8 9 0 - = bspc tab q w e r t y u i o p [ ] caps a s d f g h j k l ; \u0026#39; ret lsft \\ z x c v b n m , . / rsft lctl lmet lalt spc ralt rmet rctl ) (deflayer qwerty grv _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ @warrows _ _ _ _ _ _ _ _ _ _ lctrl @alctrl @slsft @dlalt @flmet _ _ @jrmet @kralt @lrsft @;rctrl _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ @smartspace _ _ _ ) (deflayer arrows _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ @flmet _ left down up rght _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ @smartspace _ _ _ ) (deflayer colemak grv XX XX XX XX XX XX XX XX XX XX XX XX _ tab q w f p b j l u y ; [ ] lctrl @alctrl @rlsft @slalt @tlmet g m @nrmet @eralt @irsft @orctrl \u0026#39; ret lsft XX z x c d v k h , . / rsft XX XX XX @smartspace XX XX XX ) (deflayer magic _ @clmk @qwerty _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ tab A-tab _ _ _ _ bspc esc _ ret _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ) (defalias warrows (tap-hold 200 200 w (layer-toggle arrows)) alctrl (tap-hold 200 200 a lctrl) slsft (tap-hold 200 200 s lsft) dlalt (tap-hold 200 200 d lalt) flmet (tap-hold 200 200 f lmet) jrmet (tap-hold 200 200 j rmet) kralt (tap-hold 200 200 k ralt) lrsft (tap-hold 200 200 l rsft) ;rctrl (tap-hold 200 200 ; rctrl) rlsft (tap-hold 200 200 r lsft) slalt (tap-hold 200 200 s lalt) tlmet (tap-hold 200 200 t lmet) nrmet (tap-hold 200 200 n rmet) eralt (tap-hold 200 200 e ralt) irsft (tap-hold 200 200 i rsft) orctrl (tap-hold 200 200 o rctrl) clmk (layer-switch colemak) qwerty (layer-switch qwerty) smartspace (tap-dance 200 ( (tap-hold 300 300 spc (layer-toggle magic)) (tap-hold 300 300 (one-shot 300 lalt) spc) a )) ) Tags: [linux logitech keyboard layout rust]","date":"2023-01-20T18:11:00Z","permalink":"https://balkian.com/p/kanata-advanced-keyboard-configuration/","title":"Kanata: advanced keyboard configuration"},{"content":"As a follow-up to my last post, I\u0026rsquo;ve decided to also configure my mk850 combo (k850 + m720 triathlon).\nSome notes:\nThe keyboard is usually connected to this PC through bluetooth. Since this is a change I usually do in the system for every keyboard, I added a rule for any bus (usb, bluetooth, etc) The mouse has an additional button that registers as a keyboard. Every press maps to three key events. I\u0026rsquo;ve disabled two of them and mapped the action to F19, in case I want to use it in my DE/WM. 1 2 3 4 5 6 7 evdev:input:* KEYBOARD_KEY_70039=leftctrl # bind capslock to w evdev:input:b0005v046DpB015* KEYBOARD_KEY_700e0=f19 KEYBOARD_KEY_700e2=unknown KEYBOARD_KEY_7002b=unknown After that, simply run:\n1 sudo udevadm hwdb --update \u0026amp;\u0026amp; sudo udevadm trigger Make sure the settings have been applied by running evemu-describe:\n1 sudo /sbin/evemu-describe /dev/input/event\u0026lt;id of your device\u0026gt; | grep KEY_ Tags: [linux logitech keyboard mouse]","date":"2021-10-30T00:00:01Z","permalink":"https://balkian.com/p/logitech-mb850-combi-in-linux/","title":"Logitech MB850 combi in linux"},{"content":"I recently got Logitech MX Keys for Mac keyboard at work. The German version, to be more precise. This version was three times cheaper than the Windows equivalent with either US or ES layout. Since I touch type anyway, I thought it was a bargain.\nAs soon as I plugged it in, I realized there were some glaring issues with the keyboard. First of all, the Meta/Super and Alt keys are reversed in this keyboard. In the normal/full version of this keyboard, Logitech gives an option to choose between Mac, Windows and iOS host, and that changes the behavior of the keys. In this version, tho, only iOS and Mac are available.\nBesides that, there\u0026rsquo;s the issue of the grave (tilde) and angle keys switched as well.\nSwitching these keys around would be very easy with Xorg, but Wayland once again complicates things\u0026hellip;\nThese issues almost made me return the keyboard. Luckily, tho, there is another option: configuring the keys one level lower than wayland (and X11), through hwdb.\nLong story short, this will configure any Logitech keyboard with the same product id (0x4092) to use a saner configuration:\n1 2 3 4 5 6 7 8 9 10 11 #File: /etc/udev/hwdb.d/90-logitech-keyboard.hwdb evdev:input:b0003v046Dp4092* KEYBOARD_KEY_700e2=leftmeta KEYBOARD_KEY_700e3=leftalt KEYBOARD_KEY_70039=leftctrl KEYBOARD_KEY_70064=102nd KEYBOARD_KEY_70035=grave KEYBOARD_KEY_700e7=rightalt KEYBOARD_KEY_700e6=rightmeta KEYBOARD_KEY_7006d=compose After that, simply run:\n1 sudo udevadm hwdb --update \u0026amp;\u0026amp; sudo udevadm trigger Tags: [linux logitech keyboard]","date":"2021-10-29T00:00:01Z","permalink":"https://balkian.com/p/logitech-mx-keys-for-mac-on-linux/","title":"Logitech MX Keys for Mac on Linux"},{"content":"Believe it or not, Surface tablets have pretty good linux support, except for the webcams in newer models. These are some useful notes to get Ubuntu installed in your surface go, as of Summer 2019.\nInstalling the kernel 1 2 git clone --depth 1 https://github.com/jakeday/linux-surface.git ~/linux-surface cp -a ~/linux-surface /media/\u0026lt;your usb\u0026gt; 1 2 3 cp -a /media/\u0026lt;your usb\u0026gt;/linux-surface ~/ cd ~/linux-surface/ sudo sh setup.sh Booting ubuntu first Switch out of Windows S mode.\nBoot into the \u0026ldquo;Command Prompt\u0026rdquo;.\nFrom Windows go to \u0026ldquo;change advanced startup options\u0026rdquo; and select \u0026ldquo;restart now\u0026rdquo;.\nWhen it reboots, choose the \u0026ldquo;Troubleshoot\u0026rdquo; option, then choose the \u0026ldquo;Advanced options\u0026rdquo; option, and finally choose the \u0026ldquo;Command Prompt\u0026rdquo; option.\nAfter the device reboots, login to the command prompt and then you should see a terminal with X:\\windows\\system32\u0026gt;\nAt the prompt, check your UEFI entries:\n1 bcdedit /enum firmware Copy UEFI entry of \u0026ldquo;Windows Boot Manager\u0026rdquo; to create a new entry for Ubuntu: bcdedit /copy {bootmgr} /d \u0026ldquo;Ubuntu\u0026rdquo;\nCopy the printed GUID number including the braces {} using Ctrl+C\nSet file path for the new Ubuntu entry. Replace {guid} with the returned GUID of the previous command (Ctrl+V). bcdedit /set {guid} path \\EFI\\ubuntu\\grubx64.efi\nSet Ubuntu as the first/ entry in the boot sequence. Again replace {guid} with the returned GUID of the copy command.\n1 bcdedit /set {fwbootmgr} displayorder {guid} /addfirst Check your UEFI entries again: bcdedit /enum firmware You should see something like this:\n1 2 3 4 5 6 7 8 9 10 Firmware Boot Manager --------------------- identifier {fwbootmgr} displayorder {3510232e-f8eb-e811-95ce-9ecab3f9d1c4} {bootmgr} {2148799b-f8eb-e811-95ce-9ecab3f9d1c4} {312e8a67-c2f6-e811-95ce-3c1ab3f9d1de} {312e8a68-c2f6-e811-95ce-3c1ab3f9d1de} timeout 0 Make sure the GUID you copied is the first one listed in displayorder. Then type exit, turn off the PC and turn it back on. After this my surface go is automatically booting to the grub bootloader which lets me choose between Windows and Ubuntu but defaults to Ubuntu after ten seconds.\nTags: [linux surface go config]","date":"2019-06-01T00:00:01Z","permalink":"https://balkian.com/p/linux-on-the-microsoft-surface-go/","title":"Linux on the Microsoft Surface Go"},{"content":"This is a short tutorial on connecting a zigbee device (an Aqara cube) to an MQTT server, so you can control your zigbee devices from the network.\nIf you\u0026rsquo;re anything like me, you\u0026rsquo;re probably a sucker for IoT devices. For a long time, I\u0026rsquo;ve been using WiFi-enabled lights, and Amazon dash buttons to control them. To keep these (cheap Chinese) internet enabled devices away from your network and their respective cloud services, you\u0026rsquo;ll probably want to set up a dedicated network in your router (more on this on a future post, maybe). Another disadvantage of WiFi devices is that they\u0026rsquo;re relatively power hungry.\nA popular alternative is using ZigBee for communication. It is a dedicated protocol similar to bluetooth (BLE), with lower power requirements and bitrate.\nTake the (super cute) aqara cube as an example. It is a small cube that detects rotation on all of its axes, and tapping events. Here\u0026rsquo;s a video:\nTo connect to zigbee devices you will need a zigbee enabled gateway (a.k.a. hub), which connects to your WiFi network and your zigbee devices. Once again, this means adding an internet-enabled device to your home, and probably a couple of cloud services.\nAs an alternative, you can set up your own zigbee gateway, and control it to your home automation platform of choice (e.g. home assistant). We will cover how to set up a zigbee2mqtt gateway that is also connected to an MQTT server, so you can use MQTT to control your devices and get notifications.\nWhat you need:\nAqara cube. CC2531 zigbee sniffer. CC-debugger. You will need to flash your sniffer. For that, you only need to follow the instructions from the zigbee2mqtt documentation.\nOnce you\u0026rsquo;re done flashing, you\u0026rsquo;re ready to set up the zigbee2mqtt server. For convenience, I wrote a simple docker-compose to deploy a zigbee2mqtt server and a test mosquitto server:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 version: \u0026#39;2.1\u0026#39; services: zigbee2mqtt: image: koenkk/zigbee2mqtt container_name: zigbee2mqtt restart: always volumes: - ./z2m-data/:/app/data/ devices: - \u0026#34;/dev/ttyACM0\u0026#34; networks: - hass mqtt: image: eclipse-mosquitto ports: - 1883:1883 - 9001:9001 networks: - hass volumes: - ./mosquitto.conf:/mosquitto/config/mosquitto.conf networks: hass: driver: overlay You can test your installation with:\n1 2 3 4 5 6 ❯ mosquitto_sub -h localhost -p 1883 -t \u0026#39;zigbee2mqtt/#\u0026#39; online {\u0026#34;battery\u0026#34;:17,\u0026#34;voltage\u0026#34;:2925,\u0026#34;linkquality\u0026#34;:149,\u0026#34;action\u0026#34;:\u0026#34;rotate_right\u0026#34;,\u0026#34;angle\u0026#34;:12.8} {\u0026#34;battery\u0026#34;:17,\u0026#34;voltage\u0026#34;:2925,\u0026#34;linkquality\u0026#34;:141,\u0026#34;action\u0026#34;:\u0026#34;slide\u0026#34;,\u0026#34;side\u0026#34;:2} {\u0026#34;battery\u0026#34;:17,\u0026#34;voltage\u0026#34;:2925,\u0026#34;linkquality\u0026#34;:120} {\u0026#34;battery\u0026#34;:17,\u0026#34;voltage\u0026#34;:2925,\u0026#34;linkquality\u0026#34;:141,\u0026#34;action\u0026#34;:\u0026#34;wakeup\u0026#34;} zigbee2mqtt supports the following events for the aqara cube: shake, wakeup, fall, tap, slide, flip180, flip90, rotate_left and rotate_right. Every event has additional information, such as the sides involved, or the degrees turned.\nNow you are ready to set up home assistant support in zigbee2mqtt following this guide.\nTags: [mqtt iot zigbee]","date":"2019-01-06T10:00:00Z","permalink":"https://balkian.com/p/controlling-zigbee-devices-with-mqtt/","title":"Controlling Zigbee devices with MQTT"},{"content":"tqdm is a nice way to add progress bars in the command line or in a jupyter notebook.\n1 2 3 4 5 from tqdm import tqdm import time for i in tqdm(range(100)): time.sleep(1) Tags: [python]","date":"2016-09-28T18:47:00Z","permalink":"https://balkian.com/p/progress-bars-in-python/","title":"Progress bars in python"},{"content":"Today\u0026rsquo;s post is half a quick note, half public shaming. In other words, it is a reminder to be very careful with OAuth tokens and passwords.\nAs part of moving to emacs, I starting using the incredibly useful gh.el. When you first use it, the extension saves either your password or an OAuth token in your .gitconfig file. This is cool and convenient, unless you happen to be publishing your .gitconfig file in a public repo.\nSo, how can you still share your gitconfig without sharing your password/token with the rest of the world? Since Git 1.7.0, you can include other files in your gitconfig.\n1 2 [include] path = ~/.gitconfig_secret And now, in your .gitconfig_secret file, you just have to add this:\n1 2 3 [github] user = balkian token = \u0026#34;\u0026lt; Your secret token \u0026gt;\u0026#34; Tags: [github git dotfiles]","date":"2015-04-10T17:47:00Z","permalink":"https://balkian.com/p/sharing-dotfiles/","title":"Sharing dotfiles"},{"content":"Zotero is an Open Source tool that lets you organise your bibliography, syncing it with the cloud. Unlike other alternatives such as Mendeley, Zotero can upload the attachments and data to a private cloud via WebDav.\nIf you use nginx as your web server, know that even though it provides partial support for webdav, Zotero needs more than that. Hence, you will need another webdav server, and optionally let nginx proxy to it. This short post provides the basics to get that set-up working under Debian/Ubuntu.\nSetting up Apache First we need to install Apache:\n1 sudo apt-get install apache2 Change the head of \u0026ldquo;/etc/apache2/sites-enabled/000-default\u0026rdquo; to:\n1 \u0026lt;VirtualHost *:880\u0026gt; Then, create a file /etc/apache2/sites-available/webdav:\n1 2 3 4 5 6 7 8 9 10 11 12 13 Alias /dav /home/webdav/dav \u0026lt;Location /dav\u0026gt; Dav on Order Allow,Deny Allow from all Dav On Options +Indexes AuthType Basic AuthName DAV AuthBasicProvider file AuthUserFile /home/webdav/.htpasswd Require valid-user \u0026lt;/Location\u0026gt; Ideally, you want your webdav folders to be private, adding authentication to them. So you need to create the webdav and zotero users and add the passwords to an htpasswd file. Even though you could use a single user, since you will be configuring several clients with your credentials I encourage you to create the zotero user as well. This way you can always change the password for zotero without affecting any other application using webdav.\n1 2 3 4 sudo adduser webdav sudo htpasswd -c /home/webdav/.htpasswd webdav sudo htpasswd /home/webdav/.htpasswd zotero sudo mkdir -p /home/webdav/dav/zotero Enable the site and restart apache:\n1 2 3 4 sudo a2enmod webdav sudo a2enmod dav_fs sudo a2ensite webdav sudo service apache2 restart At this point everything should be working at http://\u0026lt;your_host\u0026gt;:880/dav/zotero\nSetting up NGINX After the Apache side is working, we can use nginx as a proxy to get cleaner URIs. In your desired site/location, add this:\n1 2 3 4 5 6 7 location /dav { client_max_body_size 20M; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header Host $host; proxy_pass http://127.0.0.1:880; } Now just reload nginx:\n1 sudo service nginx force-reload Extras Zotero Reader - HTML5 client Zandy - Android Open Source client Tags: [zotero webdav nginx apache]","date":"2014-12-09T12:12:12Z","permalink":"https://balkian.com/p/zotero/","title":"Zotero"},{"content":"This is a quick note on proxying a local python application (e.g. flask) to a subdirectory in Apache. This assumes that the file wsgi.py contains a WSGI application with the name application. Hence, wsgi:application.\nGunicorn 1 2 3 4 5 \u0026lt;Location /myapp/\u0026gt; ProxyPass http://127.0.0.1:8888/myapp/ ProxyPassReverse http://127.0.0.1:8888/myapp/ RequestHeader set SCRIPT_NAME \u0026#34;/myapp/\u0026#34; \u0026lt;/Location\u0026gt; Important: SCRIPT_NAME and the end of ProxyPass URL MUST BE THE SAME. Otherwise, Gunicorn will fail miserably.\nTry it with:\n1 venv/bin/gunicorn -w 4 -b 127.0.0.1:8888 --log-file - --access-logfile - wsgi:application UWSGI This is a very simple configuration. I will try to upload one with more options for uwsgi (in a .ini file).\n1 2 3 4 \u0026lt;Location /myapp/\u0026gt; SetHandler uwsgi_handler uWSGISocker 127.0.0.1:8888 \u0026lt;/Location\u0026gt; Try it with:\n1 uwsgi --socket 127.0.0.1:8888 -w wsgi:application Extra: Supervisor If everything went as expected, you can wrap your command in a supervisor config file and let it handle the server for you.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [unix_http_server] file=/tmp/myapp.sock ; path to your socket file [supervisord] logfile = %(here)s/logs/supervisor.log childlogdir = %(here)s/logs/ [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] logfile = %(here)s/logs/supervisorctl.log serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket [program:myapp] command = venv/bin/gunicorn -w 4 -b 0.0.0.0:5000 --log-file %(here)s/logs/gunicorn.log --access-logfile - wsgi:application directory = %(here)s environment = PATH=%(here)s/venv/bin/ logfile = %(here)s/logs/myapp.log Tags: [python apache proxy gunicorn uwsgi]","date":"2014-10-09T10:00:00Z","permalink":"https://balkian.com/p/proxies-with-apache-and-python/","title":"Proxies with Apache and python"},{"content":"Developing a python module and publishing it on Github is cool, but most of the times you want others to download and use it easily. That is the role of PyPi, the python package repository. In this post I show you how to publish your package in less than 10 minutes.\nChoose a fancy name If you haven\u0026rsquo;t done so yet, take a minute or two to think about this. To publish on PyPi you need a name for your package that isn\u0026rsquo;t taken. What\u0026rsquo;s more, a catchy and unique name will help people remember your module and feel more inclined to at least try it.\nThe package name should hint what your module does, but that\u0026rsquo;s not always the case. That\u0026rsquo;s your call. I personally put uniqueness and memorability over describing the functionality.\nCreate a .pypirc configuration file 1 2 3 4 5 6 7 8 9 10 11 12 13 [distutils] # this tells distutils what package indexes you can push to index-servers = pypi # the live PyPI pypitest # test PyPI [pypi] # authentication details for live PyPI repository = https://pypi.python.org/pypi username = { your_username } password = { your_password } # not necessary [pypitest] # authentication details for test PyPI repository = https://testpypi.python.org/pypi username = { your_username } As you can see, you need to register both in the main pypi repository and the testing server. The usernames and passwords might be different, that is up to you!\nPrepare your package This should be the structure:\n1 2 3 4 5 6 7 8 9 10 root-dir/ # Any name you want setup.py setup.cfg LICENSE.txt README.md mypackage/ __init__.py foo.py bar.py baz.py setup.cfg 1 2 [metadata] description-file = README.md The markdown README is the de facto standard in Github, but you can also use rST (reStructuredText), the standard in the python community.\nsetup.py 1 2 3 4 5 6 7 8 9 10 11 12 from distutils.core import setup setup(name = \u0026#39;mypackage\u0026#39;, packages = [\u0026#39;mypackage\u0026#39;], # this must be the same as the name above version = \u0026#39;{ version }\u0026#39;, description = \u0026#39;{ description }\u0026#39;, author = \u0026#39;{ name }\u0026#39;, email = \u0026#39;{ email }\u0026#39;, url = \u0026#39;https://github.com/{user}/{package}\u0026#39;, # URL to the github repo download_url = \u0026#39;https://github.com/{user}/{repo}/tarball/{version}\u0026#39;, keywords = [\u0026#39;websockets\u0026#39;, \u0026#39;display\u0026#39;, \u0026#39;d3\u0026#39;], # list of keywords that represent your package classifiers = [], ) You might notice that the download_url points to a Github URL. We could host our package anywhere, but Github is a convenient option. To create the tarball and the zip packages, you only need to tag a tag in your repository and push it to github:\n1 2 git tag {version} -m \u0026#34;{ Description of this tag/version}\u0026#34; git push --tags origin master Push to the testing/main pypi server It is advisable that you try your package on the test repository and fix any problems first. The process is simple:\n1 python setup.py register -r {pypitest/pypi} python setup.py sdist upload -r {pypitest/pypi} If everything went as expected, you can now install your package through pip and browse your package\u0026rsquo;s page. For instance, check my senpy package: https://pypi.python.org/pypi/senpy\n1 pip install senpy Tags: [github python pypi]","date":"2014-09-27T10:00:00Z","permalink":"https://balkian.com/p/publishing-on-pypi/","title":"Publishing on PyPi"},{"content":"As part of the OpeNER hackathon we decided to build a prototype that would allow us to compare how different countries feel about several topics. We used the OpeNER pipeline to get the sentiment from a set of newspaper articles we gathered from media in several languages. Then we aggregated those articles by category and country (using the source of the article or the language it was written in), obtaining the \u0026ldquo;overall feeling\u0026rdquo; of each country about each topic. Then, we used some fancy JavaScript to make sense out of the raw information.\nIt didn\u0026rsquo;t go too bad, it turns out we won.\nNow, it was time for a face-lift. I used this opportunity to play with new technologies and improve it:\nUsing Flask, this time using python 3.3 and Bootstrap 3.0 Cool HTML5+JS cards (thanks to pastetophone) Automatic generation of fake personal data to test the interface Obfuscation of personal emails The result can be seen here.\nPublishing a Python 3 app on Heroku 1 mkvirtualenv -p /usr/bin/python3.3 eurolovemap Since Heroku uses python 2.7 by default, we have to tell it which version we want, although it supports python 3.4 as well. I couldn\u0026rsquo;t get python 3.4 working using the deadsnakes ppa, so I used python 3.3 instead, which works fine but is not officially supported. Just create a file named runtime.txt in your project root, with the python version you want to use:\n1 python-3.3.1 Don\u0026rsquo;t forget to freeze your dependencies so Heroku can install them: bash pip freze \u0026gt; requirements.txt\nPublishing personal emails There are really sophisticated and effective ways to obfuscate personal emails so that spammers cannot easily grab yours. However, this time I needed something really simple to hide our emails from the simplest form of crawlers. Most of the team are in academia somehow, so in the end all our emails are available in sites like Google Scholar. Anyway, nobody likes getting spammed so I settled for a custom Caesar cipher. Please, don\u0026rsquo;t use it for any serious application if you are concerned about being spammed.\n1 2 def blur_email(email): return \u0026#34;\u0026#34;.join([chr(ord(i)+5) for i in email]) And this is the client side:\n1 2 3 4 5 6 7 8 9 10 11 12 window.onload = function(){ elems = document.getElementsByClassName(\u0026#39;profile-email\u0026#39;); for(var e in elems){ var blur = elems[e].innerHTML; var email = \u0026#34;\u0026#34;; for(var s in blur){ var a = blur.charCodeAt(s) email = email+String.fromCharCode(a-5); } elems[e].innerHTML = email; } } Unfortunately, this approach does not hide your email from anyone using PhantomJS, ZombieJS or similar. For that, other approaches like generating a picture with the address would be necessary. Nevertheless, it is overkill for a really simple ad-hoc application with custom formatting and just a bunch of emails that would easily be grabbed manually.\nGeneration of fake data To test the contact section of the site, I wanted to populate it with fake data. Fake-Factory is an amazing library that can generate fake data of almost any kind: emails, association names, acronyms\u0026hellip; It even lets you localise the results (get Spanish names, for instance) and generate factories for certain classes (à la Django).\nBut I also wanted pictures, enter Lorem Pixel. With its API you can generate pictures of almost any size, for different topics (e.g. nightlife, people) and with a custom text. You can even use an index, so it will always show the same picture.\nFor instance, the picture below is served through Lorem Pixel.\nBy the way, if you only want cat pictures, take a look at Placekitten. And for NSFW text, there\u0026rsquo;s the Samuel L. Jackson Ipsum\nTags: [javascript python heroku]","date":"2014-03-27T14:00:00Z","permalink":"https://balkian.com/p/updating-eurolovemap/","title":"Updating EuroLoveMap"},{"content":"A simple trick. If you want to remove all the \u0026lsquo;.swp\u0026rsquo; files from a git repository, just use:\n1 git rm --cached \u0026#39;**.swp\u0026#39; Tags: [git]","date":"2013-08-22T23:14:00Z","permalink":"https://balkian.com/p/remove-git-files-with-globbing/","title":"Remove git files with globbing"},{"content":"I\u0026rsquo;ve finally decided to set up a decent personal page. I have settled for github-pages because I like the idea of keeping my site in a repository and having someone else host and deploy it for me. The site will be really simple, mostly static files. Thanks to Github, Jekyll will automatically generate static pages for my posts every time I commit anything new to this repository.\nBut Jekyll can be used independently, so if I ever choose to host the site myself, I can do it quite easily. Another thing that I liked about this approach is that the generated html files can be used in the future, and I will not need Jekyll to serve it. Jekyll is really simple and most of the things are written in plain html. That means that everything could be easily reused if I ever choose to change to another blogging framework (e.g. pelical). But, for the time being, I like the fact that Github takes care of the compilation as well, so I can simply modify or add files through the web interface should I need to.\nI hadn\u0026rsquo;t played with HTML and CSS for a while now, so I also wanted to use this site as a playground. At some point, I realised I was doing mostly everything in plain HTML and CSS, and decided to keep it like that for as long as possible. As of this writing, I haven\u0026rsquo;t included any Javascript code in the page. Probably I will use some to add my gists and repositories, but we will see about that.\nI think the code speaks for itself, so you can check out my repository on Github. You can clone and deploy it easily like this:\n1 2 3 git clone https://github.com/balkian/balkian.github.com cd balkian.github.com jekyll serve -w I will keep updating this post with information about:\nSome Jekyll plugins that might be useful What CSS tricks I learnt The webfonts I used The badge on the left side of the page Tags: [starters javascript ruby github git]","date":"2013-08-22T14:14:22Z","permalink":"https://balkian.com/p/creating-my-web/","title":"Creating my web"},{"content":" 1 (font-lock-mode) Tags: [emacs org productivity lisp snippet]","date":"0001-01-01T00:00:00Z","permalink":"https://balkian.com/p/emacs-show-plain-text-version/","title":"Emacs: show plain text version"},{"content":"Use this config to avoid HDMI flickering/intermittent blanking on RPI with a 1400x1050 VGA monitor.\n1 2 3 4 5 6 hdmi_drive=2 hdmi_group=2 hdmi_mode=42 disable_overscan=1 config_hdmi_boost=7 Tags: [rpi snippet]","date":"0001-01-01T00:00:00Z","image":"https://balkian.com/img/rpi.png","permalink":"https://balkian.com/p/fixing-hdmi-flickering/","title":"Fixing HDMI flickering"}]
						
						
					
				
				
					
						Reference in New Issue
					
					View Git Blame
					Copy Permalink