Raven's eyrie

Most of my projects start as small tools to solve problems I run into. Some stay small, some grow into libraries.


awesome-arr

*A collection of arrs and related stuff

[GitHub]

Essentially my first project on GitHub, which didn’t involve any code whatsoever.

By the time we get here, I was already deep into self-hosting: Sonarr, Radarr, Plex, etc. During this time, I came across a cool repository, rustyshackleford36/locatarr, that collected *arr-family apps. I used to check it every now and then, but at some point the repository disappeared, so I decided to make my own while also greatly expanding on the original collection. To my surprise, it somehow ended up with over 3k stars on GitHub.

juicenet-cli

CLI tool designed to simplify the process of uploading files to Usenet

[GitHub] [PyPI] [Docs]

For all intents and purposes, this was the first piece of code I ever wrote that was more complex than Fibonacci. It started off as a single-file script that you couldn’t even install from PyPI because I had no idea how to package anything.

If you don’t know much about uploading to usenet: you can’t upload folders, only individual files, and you should obfuscate them. Typical workflow is that your uploader splits the file into pieces called “articles” (if you’re familiar with torrenting, this is similar), uploads them, and records them in an NZB file. Downloaders then use the NZB to grab each article and reconstruct the file. If a single article is lost, deleted, or corrupted, everything breaks, so there are also parity files called PAR2 which are generated for a given data file and uploaded together, allowing the client to repair the file up to a certain threshold.

Now, pretty much all existing Usenet uploaders wrap your file or folder in split, obfuscated RAR archives which then get split into articles. They then write this obfuscated nonsense in the NZB file, which means you can no longer statically parse it to get any metadata. And obfuscating the NZB is worse than useless. You only really wanna obfuscate the data, because once you have the NZB you can download the file regardless. The RAR step may have served some purpose 20 years ago but it’s entirely wasteful now. It’s probably a mix of history, people following existing practices blindly, and the misconception that RARing is necessary for obfuscation and/or preserving a folder.

And you know what else PAR2 files can do? They can rename the file to its original name, which you can use to reconstruct the folder structure. So we no longer need RARs for that. RAR is also unnecessary for obfuscation, since that already happens at the article level. While testing this setup, I also found a bug in SABnzbd where it failed to correctly reconstruct the folder structure from PAR2 alone, which got fixed pretty quickly.

Now, I’m not the first one to notice any of this. Everything I’ve said here I learned from @animetosho’s extremely well written article, Stop RAR Uploads, which goes into more detail. They also happen to maintain the best Usenet tooling there is: Nyuu as the uploader, and ParPar as the PAR2 generator. Nyuu doesn’t generate PAR2 files by default and ParPar doesn’t preserve anything but the basename by default, which makes sense, as they can’t really assume what the user wants.

But I can.

So the next step was familiarizing myself with Nyuu and ParPar. Mostly ParPar, since I had to figure out how to preserve folders with ParPar’s --filepath-format option. After that, I wrote a tiny script that calls ParPar and Nyuu with the correct arguments and called it Juicenet.

The quality of the code in Juicenet hasn’t aged well. It’s very much a novice’s first attempt and it shows.

Still, it has gained more users than I ever expected, likely because to this day it’s one of the few, if not the only, high level uploaders that does everything without using RAR archives. I can definitely do way better if I rewrote it from scratch, because that’s probably the only way I’d get rid of every architectural mistake I made, but the fact that this has real users means I’ll end up breaking them, so it’s not exactly an easy choice.

pyanilist

Python wrapper for the AniList API

[GitHub] [PyPI] [Docs]

I use the AniList API in a lot of scripts to manage my self-hosted collection. For a while I stuck with existing libraries, but issues kept cropping up and I was never really happy with them - most lacked proper type hints, structured objects, or both. Some also seemed entirely unmaintained. After enough TypeErrors and expressions like data["Media"][0]["title"]["english"], I finally gave up and started writing my own.

How hard can an API wrapper be anyway? Just parse the API and be done with it. Except AniList is GraphQL, with a pretty large surface area, circular relationships, and some awkward response shapes. That probably explains why nobody seemed eager to maintain an AniList library. After thinking about it, I decided I would rather have an ergonomic API than try to cover everything AniList exposes, so I narrowed things down to what I actually needed and focused on making that simple.

I also ended up post-processing responses because AniList is not particularly consistent. Sometimes “empty” nested objects have every field set to null, sometimes the entire object is just null, and arrays occasionally contain null elements. The wrapper normalizes these cases so the returned values match the type hints and require fewer None checks.

The first version of this library released with 9 required dependencies. This wasn’t a problem for me at the time, but as I worked on more projects I started developing a stronger stance on unnecessary dependencies. For example, a dependency that’s slow to update blocks my entire project from updating to the next Python version. So as I worked on this further, I slowly removed most of them, to the point where the latest version only depends on two things: a networking stack (httpx) and a data validation library (msgspec).

On a slight tangent, this is also a project where I really wish Python had some form of None-aware operator.

pynyaa

Turn nyaa.si torrent pages into neat Python objects

[GitHub] [PyPI] [Docs]

AniList metadata wasn’t the only thing my scripts needed, but Nyaa does not offer any API. I looked around for existing libraries but didn’t find anything satisfactory.

So I wrote a small function that parsed the page for the few fields I needed. That worked for a while, but it kept growing as I needed more metadata from Nyaa. At some point it started hitting quirks of how the site represents things (a release can be both trusted and a remake, but since the panel only has a single color, it just ends up red), and eventually it got messy enough that I decided to turn it into a standalone library.

The first release tried to do too much and ended up with about ten dependencies. It handled caching, parsed torrent files, used pydantic despite already validating everything by hand, and pulled in lxml when the standard library would have been enough.

That made it harder to use in different contexts, since it forced those decisions onto the user, so I started stripping those pieces out. These days it’s a lightweight library that just focuses on parsing Nyaa pages, and only depends on two packages: a network stack (httpx) and an HTML parser (beautifulsoup4).

At this point it covers pretty much every field Nyaa exposes and has completely replaced my original scraping code. It returns type-safe, structured objects and works in both sync and async code. Since I rely on it heavily in my own scripts, it has also ended up fairly battle-tested against real-world cases.

archivefile

Unified interface for tar, zip, sevenzip, and rar files

[GitHub] [PyPI] [Docs]

I was dealing with archive files of various formats and scripting against them quickly became an annoying dance of if-else branches to handle the fact that tarfile, zipfile, py7zr, and rarfile all behave differently despite doing the same basic things. So I wrote archivefile.

On a high level, the ArchiveFile class is defined by a protocol with an API that covers the common functionality. It’s somewhat inspired by pathlib, which I think is great. I then implement this for the aforementioned formats. At runtime, it does a single check to dispatch the correct handler, and I no longer have to write boilerplate just to read a single file from an archive.

In fact, due to this approach, every method under ArchiveFile has a single line worth of body:

1    def get_member(self, member: StrPath | ArchiveMember) -> ArchiveMember:
2        """
3        <Docstring redacted for brevity>
4        """
5        return self._adapter.get_member(member)

It also has a fair amount of tests to ensure the handlers produce the same output regardless of the underlying format.

Developing this led me to find various bugs and missing functionality in py7zr, which I fixed upstream, becoming the second highest committer on the project with 15 or so merged PRs. Thanks to the py7zr maintainer, @miurahr, for being great to work with and accepting my PRs.

nzb

A spec compliant parser and meta editor for NZB files

[GitHub] [PyPI] [Docs]

Possibly one of my favorites. It’s a performant, pure Python, dependency free, type-safe spec compliant parser and meta editor for NZB files following the “make invalid states unrepresentable” pattern (well, as long as you stick to the public API, because it’s Python after all), so if you successfully construct it, you know it’s a valid NZB structure.

Before working on this, I was convinced XML is a scary format and even added a dependency, xmltodict, to avoid dealing with it, but as you can imagine the resulting dict isn’t pleasant to work with. XML just doesn’t lend itself well to that kind of transformation. After working on the Rust implementation (spoilers), which forced me to deal with XML because there wasn’t an equivalent to xmltodict, I realized XML isn’t scary at all. So I dropped xmltodict here as well and switched to the stdlib’s xml.etree.ElementTree, which is significantly faster and easier to work with.

Beyond that, it also provides various ergonomic methods for introspecting itself. I’ve used it extensively on real world files, so I think I can confidently claim it’s the best NZB parser in Python, however niche that might be.

seadex

Python wrapper for the SeaDex API

[GitHub] [PyPI] [Docs]

Pretty much what it says on the tin. It’s a fairly standard API wrapper for SeaDex. There’s not much interesting to talk about here. By this point it’s my third API wrapper, so I had a decent idea of how to approach it. Development ended up being fairly uneventful, in a good way.

stringenum

A small, dependency-free library offering additional enum.StrEnum subclasses and a backport for older Python versions

[GitHub] [PyPI]

This was mostly a toy project because I wanted to play with metaclasses. It backports features from newer Python versions down to 3.9 and provides a bunch of other string enums with different properties and guarantees.

The somewhat interesting part here is that the invariants are enforced at construction time, so you can always rely on them, and it’s fully typed. Typing metaclasses was pretty confusing to wrap my head around, but in the end I got it and this library plays well with typecheckers.

I wouldn’t recommend depending on this library. You should just copy paste the relevant parts if you really need them.

nzb-rs

A spec compliant parser for NZB files

[GitHub] [crates.io] [Docs]

Rust had been on my radar for a while, so one day I finally sat down and read the Rust book. Great tooling (cargo), null safety, a powerful type system, pattern matching - there was a lot going for it. Naturally, I needed a project, and parsers seemed like exactly the kind of thing Rust excels at. My NZB parser was a perfect candidate: it deals with XML, a format with a long history of security pitfalls that safe Rust largely avoids by design.

Moving from Python to Rust felt pretty natural, likely thanks to Rust’s zero-cost abstractions that are just as expressive as many Python constructs. Static typing wasn’t new to me either since I was already using type hints in Python.

Traits feel like a more powerful mix of dunder methods, abc.ABC, and typing.Protocol. I miss Rust’s enums every time I write Python, and Option<T> would make any PEP-505 fans jealous (myself included).

The borrow checker rarely got in my way. The rules are straightforward: a value has a single owner, and only one mutable reference at a time. I also barely had to think about lifetimes since the compiler inferred most of them. Things only got rough when I experimented with async and started running into more cryptic borrow checker errors, but that is a story for another time.

The project evolved alongside my understanding of Rust. The first version of the parser was a fairly direct port of the Python implementation with an almost identical API. Even so, the Rust version ended up being several times faster. Some of that came from Rust itself, but the process also highlighted inefficiencies in the Python implementation. Taking those lessons back, I was able to remove quite a bit of slower code and narrow the performance gap. Later versions of the Rust library evolved into a more idiomatic API. Ironically, the Python implementation was eventually updated to be closer to the Rust API wherever it made sense.

rnzb

Python bindings to the NZB-rs library - a spec compliant parser for NZB files, written in Rust

[GitHub] [PyPI]

A third NZB parser? Yes. At this point I might have a problem.

After writing one in Python and another in Rust, I had a thought: “Wouldn’t it be cool to use the Rust implementation directly from Python and get the best of both worlds?” Spoiler alert: yes.

My goal was simple: a drop-in replacement so existing Python code could immediately benefit from the Rust parser.

After a bit of research I found PyO3, which powers Pydantic and many other Rust-powered Python extensions. This was my first time writing an extension module, but thankfully PyO3 takes care of most of the nitty gritty details and lets me write Rust like nothing’s changed. Honestly, the maintainers have done an amazing job making this so easy I could hardly believe it. There were a few small hiccups along the way, but the docs and maintainers were incredibly helpful whenever I had questions.

Once everything worked and the tests passed, the next step was packaging. Wheels are prebuilt Python packages so users do not have to compile anything during installation. To build wheels for multiple platforms I used pypa/cibuildwheel, which seems to be what most projects rely on.

One interesting detail about extension wheels is Python version compatibility. For example: rnzb-0.6.0-cp314-cp314-win_arm64.whl. The cp314-cp314 tag means the wheel only works on CPython 3.14. On newer versions the installer falls back to the source distribution and tries to build it locally, which usually fails without a Rust toolchain.

To avoid releasing new wheels for every Python version, I built the extension against the Limited C API (abi3), which PyO3 supports. The result looks like this: rnzb-0.6.0-cp39-abi3-win_amd64.whl. The cp39-abi3 tag means the same wheel works on every CPython version starting from Python 3.9.

So yes, the end result was a third NZB parser. But this one is a little different: it brings the speed and safety of the Rust implementation to Python while keeping a familiar drop-in API. You get the Rust parser without changing your code.

atomicwriter

Cross-platform atomic file writer for all-or-nothing operations.

[GitHub] [PyPI] [Docs]

Every now and then I need to write something atomically. There used to be a package for this, python-atomicwrites, but it is unmaintained now.

I am not an expert in file system behavior across different platforms, but I knew of one: Rust’s tempfile crate. It already implements atomic writes (and a lot more) across platforms. I also wanted an excuse to use more Rust, so this was a good fit.

Thanks to everything I learned from my previous projects, this one was fairly straightforward. I wrote a thin wrapper around the Rust library, exposed an idiomatic Python API, added some tests, and published it with abi3 wheels.

PrivateBin

Python library for interacting with PrivateBin’s v2 API (PrivateBin >= 1.3) to create, retrieve, and delete encrypted pastes.

[GitHub] [PyPI] [Docs]

I regularly use an instance of PrivateBin to temporarily and securely store logs from various scripts. Initially I used the PrivateBinAPI package. It had incomplete type hints and I was not a fan of the API, but it worked… until it didn’t.

So I set out to write one from scratch.

One of the defining features of PrivateBin is that it never actually sees your paste. Everything is encrypted client-side before being sent to the server, which only stores and returns the encrypted blob, so the client has to handle both encryption and decryption.

Unfortunately (and maybe this was just me being dumb), the PrivateBin documentation did not help much here. So I started digging through the source code of PrivateBinAPI to figure out how it handled things.

Turns out… it doesn’t. It depends on another library, PBinCLI, to actually do the talking to PrivateBin.

So I went and read that code instead. After a few hours of trial, error, and carefully extracting pieces of logic, I eventually isolated the core encryption and decryption routines. Once that part was figured out, the rest of the client was fairly straightforward, and it doesn’t depend on another PrivateBin library.

myne

Parser for manga and light novel filenames

[GitHub] [PyPI] [Docs]

Manga and light novel filenames are a mess. Well, mostly. The big release groups tend to follow pretty consistent naming schemes, but everything else is all over the place, especially Korean manhwa releases. There’s no anitopy equivalent for manga and light novels, so I ended up writing my own parser to turn them into structured metadata like title, volume, and chapter.

It’s a Python library written in Rust, and under the hood it is entirely regex-based. It matches the most specific patterns first and removes them from the filename, then moves on to more ambiguous ones. Whatever is left at the end is treated as the title.

The name “myne” comes from the main character of Ascendance of a Bookworm, who really loves books. I’m still pretty proud of that one.

I’ve used it extensively on real-world files, and it has held up well so far.

mkvinfo

Python library for probing matroska files with mkvmerge

[GitHub] [PyPI] [Docs]

I needed a way to introspect MKV files so I could classify them further. mkvmerge can already introspect MKV files and return the result as JSON, but consuming that output directly from Python gets annoying pretty quickly.

This library is basically a thin wrapper around that. It runs mkvmerge -J and turns the JSON output into typed Python objects so I don’t have to deal with it myself.

misaki

misaki is a fast, asynchronous link checker with optional FlareSolverr support.

[GitHub] [crates.io: misaki-core] [crates.io: misaki-cli] [Docs]

A good friend of mine wanted a link checker with FlareSolverr support. A link checker is a pretty easy project, and it was an excuse to try out async Rust.

This was the first time Rust actually felt like a pain. The errors got quite a bit more cryptic, and async support still feels somewhat incomplete.

Some patterns that should be simple end up awkward. For example, there is no built-in way to “yield” values from async code, so you end up relying on third-party crates like async-stream to emulate async generators. There is also no async drop, which makes cleaning up async resources harder than it should be.

It’s not all bad though. After simply .clone()ing my issues away, I ended up with a really fast link checker.

ravencentric.github.io

[GitHub]

Well, that’s this site. My apex domain had been sitting unused for a while (my project docs already live at $domain/$project), and a home page was pretty much the only thing I could put here, so that’s what it is.

It’s a static site built with Hugo and hosted on GitHub Pages. The theme is clente/hugo-bearcub because it’s simple, does everything I need (pretty codeblocks), is JavaScript-free, and comes in under 10KB.

There’s not much else to add here since I didn’t really do anything beyond taking some off-the-shelf pieces and putting them together.