Python packaging: why we can't have nice things, part 1 – The Old Refrain
(zahlman.github.io)5 points by zahlman 20 hours ago | 11 comments
5 points by zahlman 20 hours ago | 11 comments
zahlman 8 hours ago | root | parent |
>It's not the best, but at least it isn't
I think it's good to be able to criticize the things you like best. And thinking that someone else might be having a worse experience, doesn't improve my own.
Many common complaints about Python packaging are things I don't find valid. I chose to get that stuff out of the way, before getting into the stuff that does concern me (including things that I don't commonly hear others mention).
zahlman 20 hours ago | prev | next |
Hi HN (and Merry Christmas to those celebrating it),
I've started a series of blog posts talking about the Python packaging ecosystem and the many things that are going wrong with it. I know there are countless pieces out there already about the "mess" (that's in fact just about the first point I make); my goal with this series is to get into the details of why things are so awful, and uncover the underlying faults.
As a bit of background: I came here originally (https://news.ycombinator.com/item?id=41286179) because I saw that this was one of the relatively few places talking about Tim Peters' suspension from Python core development, which in turn was related to my own ban from the Python Discourse forum (https://discuss.python.org). While I joined that forum for other reasons, I found that most of my time there was spent on discussions of the packaging system. I used to not think about that very much - I was already able to make installable packages for simple, pure Python projects and have them work just fine, and I'd learned a few habits to avoid the most common pitfalls. But over the last year and a half or so, I've been making myself an expert on the topic.
In the first post, I go over a grab bag of common complaints that I don't think are major problems, or where people commonly seem to have unrealistic expectations. The Python world is fundamentally different from that of other "modern" languages with an "ecosystem": it's much more common to rely on compiled code in C or another similar language (and thus have to deal with an FFI, as well as figuring out who's going to build the code and when); the language is much older (it was designed in an era when a rich standard library was desirable because you couldn't rely on grabbing code from the Internet) and just plain works differently (the way imports work makes it unfeasible to support multiple versions of the same library in a given environment); etc.
This is of course just setting the stage for being able to rip into people for the things that really are inexcusable ;)
This piece ran about 4500 words, which is quite a bit more than I planned on (and I ended up splitting the writing across two days). Some of you might recognize things I've said in previous HN comments. For a bit of entertainment value, I've set the section headings to a musical theme.
Future planned topics for the series include, in no particular order:
* A "part 0" to go over basic concepts for people not already familiar with Python packaging
* A guide to a bunch of the tricks I use to mitigate these problems
* Security issues with Pip and its over-eagerness to build sdists (and thus run `setup.py` etc.), and the history of that problem
* The specific history of Setuptools taking over from Distutils
* The absurdly high count of Setuptools downloads from PyPI (https://pypistats.org/packages/setuptools)
* The awful metadata standards, and how legacy support for `setup.py` tricks has been holding things back
* Why wheels don't use better compression or internal symlinks, and the consequences for PyPI's bandwidth
* A bunch of stuff about the internals of Pip and Setuptools, and how awful they are (I'm not sure how this will be organized, but I especially want to cover the quasi-standard status they have despite not being part of the standard library)
* A discussion of why people don't seem to like venvs very much, and why they really aren't that bad (spoilers: a lot of the pain is really Pip's fault! And, very indirectly, Setuptools' fault too.)
* The social issues that are making it harder to develop standards (and get tool authors on board)
* An overview of the build backend I've been writing, `bbbb`
* A design for an application and library installer that I'd like to replace Pip and Pipx, which I'm calling Paper - the Python Application, Package and Environment wRangler
KolenCh 14 hours ago | root | parent |
Will your series at some point incorporate conda in the picture?
zahlman 14 hours ago | root | parent |
My goal is not to give an overview of available tools, and I know quite little about Conda specifically to begin with - my expertise is in building (and even then really only on the Python side) more than specific tools. So I'm not planning a specific critique of it or anything, no.
I will certainly mention Conda in contexts where it makes sense to do so.
KolenCh 14 hours ago | root | parent |
Ok, thank you. It’s a pity for me as I think having a resource that explain the big picture of both would be great. But I don’t have enough of a big picture to write about it yet.
The thing about conda is that it is not just a tool, it is a completely different ecosystem to handle what you describe in your post. You touched on the topic of building & packaging, distribution, environment management, etc. And all these are different in conda land. To me the answer to the problems you mentioned in the post is conda. But I think it is commonly misunderstood and some people hate to use it. Not to mention needing to maintain publishing in both PyPI and a conda channel is laborious.
zahlman 12 hours ago | root | parent |
>it is a completely different ecosystem... all these are different in conda land.
This matches my understanding. But I also understand that Conda doesn't treat Python as anything special; it just provides its own kind of environments for multi-language computing.
Many people do swear by Conda and I should look into it eventually. (In some far future, maybe Paper can work in a Conda environment and avoid the problems that Pip encounters there.) If I'm going to fix problems (and educate about them) on the Python-first side, I need to have priorities, though.
KolenCh 4 hours ago | root | parent |
There’s a chain of stuffs happening regarding packaging and the first step in the chain using PyPI limits what one can do to fix it. Conda starts with a different design in how to build a package and also how to bundle it. It specifically handles compiled code much better, because in that non pure Python case, compiler is needed at some point, and that is going to cause problems if you have no control over it. Conda essentially achieve reproducibility, with the necessary compiler considered as part of the dependencies. While in a certain sense of reproducibility can be achieved with PyPI packages, my understanding is that, eg with manylinux, it takes a hope for the best approach and can break in exotic cases, which should never happen with conda packages.
> The genesis of Conda came after Guido van Rossum was invited to speak at the inaugural PyData meetup in 2012; in a Q&A on the subject of packaging difficulties, he told us that when it comes to packaging, "it really sounds like your needs are so unusual compared to the larger Python community that you're just better off building your own"
From https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-mi...
smitty1e 19 hours ago | prev |
One of the things I love about python its informal Second Mover[1] approach.
Features like enum and async were added after they were fully wrung out elsewhere, and generally fit well.
Is python tooling a flaming dumpster fire at the moment? Sure. Will wisdom precipitate out in its good time? One is confident.
[1] https://en.wikipedia.org/wiki/First-mover_advantage#Second-m...
zahlman 19 hours ago | root | parent |
I hadn't thought of it that way before, but I think you have a good point. The real idea behind "There should be one-- and preferably only one --obvious way to do it." often isn't praise for Guido's time machine - sometimes it's "please just accept the 15th competing standard (https://xkcd.com/927/), we worked really hard on it and it fixes real problems". And where packaging in particular has failed, it has a lot to do with not wanting to leave people behind or force them to switch. (You can still see the scars of the 2->3 migration all over these discussions.)
People often criticize the "move fast and break things" philosophy, but I do think it's important to "keep moving, point where you're going and stop pretending the super-glued cracks aren't there". I do have a post in mind about that topic generally, but not as part of this series.
smitty1e 18 hours ago | root | parent |
Converging on something useful is implicit in the iterative approach.
odie5533 10 hours ago | next |
I will never understand all these posts about Python packaging. It's not the best, but at least it isn't Gradle or Maven or Bundler or Podfile or NPM/Webpack/tsconfig/eslint/babel nonsense. When I install a JavaScript library, I have to edit like 5 different files to get it to start working. And god forbid I upgrade, then I have to read the migration docs every single time because every upgrade choice they make is unique.
Latest headache: eslint completely changed the config file so now you have to convert your old file, oh and your plugins don't work so you need to use the plugin compatibility shims that also don't work.
I think some of the hate for Python packaging is that it's everyone's first foray into the packaging world since it's everyone's first programming language. I think if Python were a more difficult language, people wouldn't complain about the packaging at all.