How to self-host all of Bluesky except the AppView (for now)
(alice.bsky.sh)138 points by icy 6 days ago | 82 comments
138 points by icy 6 days ago | 82 comments
freedomben 5 days ago | root | parent | next |
Same, exactly. I would so much rather be given a docker-compose or k8s yaml along with some other tidbits like how to run migrations and stuff, than get a bash script I can just run. I've been doing this long enough to know that it's not initial setup work that really matters, it's the upgrade and backup/restore story that really matters. If your bash script just pulls and runs a docker container or something then cool, but if it's doing much more than that then that's a big red flag to me.
diggan 5 days ago | root | parent |
Here:
- https://raw.githubusercontent.com/bluesky-social/pds/main/in... has all the expected outside docker-compose setup, you can read it through in like 5 minutes
- Heavy-duty part of the setup is running https://raw.githubusercontent.com/bluesky-social/pds/main/co... which you should be familiar with
I guess the shellscript is for people who want a one-line install, which I wouldn't do myself either, but I guess some people prefer.
zzyzxd 5 days ago | root | parent |
The script even installs docker with apt by itself (which, I think, is the only reason they require Ubuntu as the OS -- to not to deal with any other package manager variants)... I mean, why? Just let people install docker however they like! If you don't even trust your users to install a container runtime, who's your target audience really?
It's also over complicated, like, it even tries to handle race condition of multiple apt processes! What kind of environment do they expect the users have? As the project become more popular, the script will need to handle more edge cases. Let's see if it is still a 5 minutes read one year later.
> I guess the shellscript is for people who want a one-line install, which I wouldn't do myself either, but I guess some people prefer.
This is the problem in lots open source projects -- providing a one-liner installer and bragging about how easy the initial setup is, without an easy path for long term maintenance. Give it some time, many happy users of the one-liner will be unhappy when they encounter issues.
j45 5 days ago | root | parent | prev | next |
This message is for anyone who might find trying self-hosting intimidating.
Like hosting an application in the cloud, you also will never stop improving how you self-host.
If there's questions lacking about a software package, it's often could be reflected in your self-hosting environment too.
Running this type of an installer is excellent to quickly introduce yourself to any technology - to then start learning about how you want to run it long term.
The questions expressed above are not new. How SRE's solve it today also can be different and more complicated than needed.
Easy answer - if they have an install script, it's getting run inside a VM, or Docker which itself is a baseline backup and HA automatically if needed.
If generally anything is run inside of a self-hosted hypervisor like Proxmox, it can be setup to automatically backup, mirror, HA as-is, while you figure out what you want. This includes running docker inside a Proxmox VM, there is not a big performance hit anymore for doing this for things that are largely idle most of the time.
There is a big difference between SaaS, PaaS, and IaaS. It's easier and easier to get the benefits from all three by being willing to build up the foundation instead of pointing at the gaps in each package for not filling it for you.
It's encouraging to see things becoming more possible :)
dawnerd 5 days ago | root | parent | prev | next |
I was about to do this as well but their installer sketched me out. Why can't it just be some easy to follow docker instructions? They use docker too but instructions to set it up on your own is basically "read the installer script".
Meanwhile mastodon is incredibly easy to self host w/ relays.
j45 5 days ago | root | parent | next |
An installer script is often an early step, and much better than nothing... as well as a step towards docker.
Here's the kicker, the install script could be called from a Dockerfile pretty easily, no? Sure, there might be things to sort out, but it doesn't seem unreasonable.
I agree having a docker image is super handy and can be quick to try, as well as update, and put into a larger self-hosted environment how you need.
benharri 5 days ago | root | parent | prev |
i certainly wouldn't say mastodon easy to self host
hagbard_c 4 days ago | root | parent |
Pleroma [1] is and does the activitypub thing just as well. I installed it to see if it added anything worth keeping together with the other activitypub things I'm running (Peertube, Pixelfed and Lemmy, the latter two only for testing purposes, the first sees real use on several instances) and can vouch on the ease of getting it up and running.
[1] https://docs.pleroma.social/backend/installation/otp_en/
diggan 5 days ago | root | parent | prev |
> How do I secure the webserver and the data? Where is the data on my disk? How to backup and restore? High availability?
I feel like that it's kind of out of the scope from an article describing the steps for application/protocol specific infrastructure. You need to look for resources, guides and such for general self-hosting instead, somewhere else.
For example, if you use TrueScale NAS/unraid/proxmos or whatever for local self-hosting, you'd setup those things via those platforms. If you use Kubernetes/Nomad/Incus/Containers, you'd solve those things via that tooling.
sureglymop 5 days ago | prev | next |
It's great that you wrote this up!
One thing I have found with many open source/selfhostable projects is just how much running them yourself can vary. It can go from a simple compose file with everything included to having to dig for obscure services and piece together how they all form the whole.
For example, I recently looked into self hosting Zotero. It is so under documented and complex that there is almost no way one could self host that (even for just one user) without that being ones job. So one needs to make a distinction between something being open source and being feasible to use/maintain.
In the end I gave up with Zotero. Even though it could have replaced Obsidian Notes, Calibre and Syncthing all at once for me.
diggan 5 days ago | root | parent | next |
> For example, I recently looked into self hosting Zotero. It is so under documented and complex that there is almost no way one could self host that
I've come across this a lot too. But what I've found is that it mostly applies to open source projects that offer a hosted paid version, so it kind of makes sense they'll make the experience slightly worse than it could be (consciously or subconsciously), as it pushes people to their hosted solution. I don't particularly like it though.
Doesn't seem to be the case for Zotero specifically, but your comment reminded me that I've noticed this more often lately.
sbarre 5 days ago | root | parent |
Yeah I tend to use ease of install for community editions of hosted paid open source projects as the leading indicator of how seriously they invest in (and support) their free/community version..
elashri 5 days ago | root | parent | prev | next |
> For example, I recently looked into self hosting Zotero. It is so under documented and complex that there is almost no way one could self host that (even for just one user) without that being ones job. So one needs to make a distinction between something being open source and being feasible to use/maintain
Just for the benefit for anyone that want to go through this rabbit hole. You cannot selfhost Zotero. In theory but in practice it is no feasible. If you find their free storage limiting then store them on webdav (all clients support that).
zotero team explicity said that they don't see this as a priority [1] and with the release of zotero 7 and transition it is not realistic to think they will ever do.
[1] https://github.com/zotero/dataserver/issues/105#issuecomment...
apitman 5 days ago | root | parent |
This is why it's not enough for software to be open source. It also has to be forkable.
__justplaying 5 days ago | root | parent | prev |
Self-hosting/mirroring all these Bluesky components is currently a mixed bag as well though honestly the only outlier is the Relay, which is a beast. i currently have my copy of the PLC, a Jetstream with 2 days of data and a clone of the app on my laptop i play with sometimes and/or change things for an elaborate shitpost of Bluesky Nitro https://bsky.app/profile/alice.mosphere.at/post/3l7bpmmtiop2...
I don't self-host my PDS yet because there is no migration path back yet (but there will be). Though maybe I'll just yolo one day and do it anyways.
jchw 5 days ago | prev | next |
I appreciate this effort. I've definitely been interested in how plausible it would be to, today, run another instance of the Bluesky AppView, mainly because AT proto seems promising, but to really meet it's full potential it needs independent operators with different sensibilities.
I've been thinking a lot about the relay, though. 4.5 terabytes is, well... A lot, to say the least. If Bluesky grows 100x larger, running a relay will become pretty insanely expensive. I guess if the Bluesky organization remains fairly neutral about the relay part, it's not a huge deal, but:
- It always eventually becomes hard to stay neutral. Eventually someone will get mad at something going through your network that isn't just obvious network abuse like SPAM.
- It seems like drinking from the firehouse itself will eventually become expensive. Will it be possible for something this high bandwidth to remain freely-accessible?
98codes 5 days ago | prev | next |
This is all academic for me until Bluesky gets the functionality to get an account back onto their main network, for DR if not peace of mind that an "undo" is possible.
diggan 5 days ago | root | parent |
Totally understandable. Personally I don't use Bluesky for anything vital, it's just data that the world wouldn't be better/worse without anyways, so I'm gonna go and give it a try even if there is no undo.
I love that people even has the choice, so much better than not even being able to.
__justplaying 6 days ago | prev | next |
author here, should you have questions!
moreati 5 days ago | root | parent | next |
What's in that 4.5 TB? e.g. message metadata? Message text? Media?
What time window does it cover? A rolling N day window? Everything since year dot?
Can it be pruned? e.g. only data of accounts followed or messages interacted with
theschmed 6 days ago | root | parent | prev | next |
Thanks for making yourself available to answer questions! Hopefully this is not a dumb question.
Is plc.directory a single point of failure for BlueSky users who want to take advantage of the benefits of a did:plc? And if so, is that a permanent thing or down the road will there be multiple interoperating did:plc directories?
__justplaying 6 days ago | root | parent |
yes it's a SPOF. not sure about the second question, but i do know there are plans to transfer its ownership to an independent foundation
pfraze 5 days ago | root | parent |
Transferring to an independent org is what we're talking about now, yes.
The backstory to PLC is that we picked up the DID standard and looked for an existing registry-method that would satisfy requirements¹. None of them really did. We then surveyed mechanisms for decentralized operation: DHTs, open blockchains, permissioned blockchains, and federated databases. Of them, the two blockchain variants seemed perhaps promising, but still premature since (as of 2022) you there's cost variability due to load and in some cases bad transaction latency (eg 10 minutes).
We decided the best decision was to create PLC, which matches all of the requirements except for longterm meta governance. The way we designed it was to make the registry mechanics transferrable to a different protocol in the future, so that if for instance we decided (say) a DHT was suitable (it's not) we'd be able to use the same identifiers but change resolution and mutations to a new process. Then we started talking to other SMEs to get their take.
Ultimately the solution that's gotten the most favorable response has been setting up an ICANN-style independent organization to operate it. This can be joined with a couple of interesting systems, such as mirrors which tail a certificate-transparency-style audit log, and which could even serve as transaction witnesses to indicate when the core registry might be rejecting updates ("write censorship").
What can I say, some things take time and stakeholder-building. Look up the history of DNS and Network Solutions Inc for a bit of a wild ride that people have forgotten about. One other thing I should point out is that the DID spec enables multiple registry methods. Atproto currently supports did:web, and if other methods show up which satisfy the requirements then we are interested.
¹ Secure against manipulation by the registry operators, longterm meta governance, highly available, reasonable transaction latency, reliably low cost that's not dogged by token speculation, low ecological impact.
jazzyjackson 5 days ago | root | parent | next |
Hey pfraze, forgive my ignorance but what role does DID serve that DNS doesn't? My favorite part about bsky is using TXT record to prove that I control my domain for username purposes, what's the downside to just generating a keypair, and using the fingerprint of the public key as my identity? (Maybe with some affordance for key rotation vis a vis KERI*) Not doubting youall weighed every possibility, just wondering what I'm missing
*Key Event Receipt Infrastructure
steveklabnik 5 days ago | root | parent |
Not Paul, but DID is a stable ID over time, whereas dns is not. This lets you change your handle without the network losing track of who you are. I was @steveklabnik.bsky.social before I was @steveklabnik.com, and when I made the switch, all of my previous stuff was still there.
This is a fun party trick in some sense, but also a real meaningful feature in another. If I ever decide to move from steveklabnik.com to steve.klabnik.com, a thing I have been considering for a few years, my stuff on @proto/Bluesky will be one of the only services that doesn't have the issue that's kept me from pulling the trigger: updating the entire world that that's where I am now.
kiitos 5 days ago | root | parent | next |
DIDs are stable only in the context of a specific 'verifiable data registry' as the spec puts it.
https://www.w3.org/TR/did-core/#dfn-verifiable-data-registry
DIDs delegate trust and authority to a data registry, in exactly the same way that DNS delegates trust and authority to ~ICANN.
The system model is exactly the same. The difference is only in the properties of the authoritative entity.
steveklabnik 5 days ago | root | parent |
That's a good point: I was speaking in a more social manner. Because domains are human-readable, they tend to be used for humans. Bluesky could have chosen to just use domains, but I personally prefer that we have the additional layer of indirection. Plus like, you have the ability (at the low level, not really exposed in the UI in any meaningful way) to be multiple people: I can associate multiple domains with my DID.
That said, you're not wrong that a registry is a registry.
kiitos 5 days ago | root | parent |
Yeah, definitely not suggesting domains are a better form of identity!
pfraze 5 days ago | root | parent | prev |
Yes! And if this were not the case then account portability between PDS hosts would be really challenging. Same logic as keeping your phone number when you switch cell carriers
Kye 5 days ago | root | parent | prev | next |
>> "What can I say, some things take time and stakeholder-building."
The ongoing WordPress fiasco is a good sign of what happens when you set up an independent organization too soon. You won't have the people or the commitments from those people to maintain that independence, so the independent thing ends up not being able to do anything to protect the thing that was supposed to be independent from the commercial interests looking to exploit it.
mitochondriaz 5 days ago | root | parent | prev |
Can you say more on why DHTs are not a solution? Are you aware of https://github.com/pubky/pkarr, for example? It seems to be very good!
jervant 5 days ago | root | parent | prev | next |
How are Direct Messages implemented in Bluesky if anyone can access a firehose of all network activity?
__justplaying 5 days ago | root | parent |
DMs are currently 1:1 only and closed source. They are working on/planning to build proper E2EE DMs that support group chats.
mintplant 5 days ago | root | parent | prev |
What's the difference between social-app and the AppView?
pfraze 5 days ago | root | parent |
social-app is the client side, AppView is the backend api surface
ck2 5 days ago | prev | next |
I found it interesting it's almost impossible, very difficult to get real Bluesky stats
This site tries but has limits:
* https://bsky.jazco.dev/stats
They broke 14 million yesterday and it seems to be snowballing now since the election:
* https://bsky.app/profile/jaz.bsky.social/post/3laetwhztdk2x
__justplaying 5 days ago | root | parent |
https://bskycharts.edavis.dev/ is a good starting point for a number of charts
heavensteeth 5 days ago | prev | next |
This site is extremely snappy. Good work.
__justplaying 5 days ago | root | parent |
Thanks! Its code is available at https://github.com/aliceisjustplaying/whtwnd-blog, I intend to turn this into the template as the posts are stored on my PDS, on ATProto, using WhtWnd https://whtwnd.com/
(And all of this is a fork of my friend's Samuel's blog, https://mozzius.dev, see https://github.com/mozzius/mozzius.dev)
apitman 5 days ago | root | parent |
Your site is stored on your PDS and available on whtwnd, but also hosted directly on your domain. How are they connected exactly?
mdaniel 5 days ago | prev | next |
Also, yesterday someone posted[1] https://frontpage.fyi/ which seems like it's predominately Bluesky/ATprotocol news but since both of those interest me, if this blog link interests you then so might that link. It logs in with Bsky oauth2 federation
jazzyjackson 5 days ago | prev | next |
Is it feasible to run a bluesky instance "on prem" and "offline" for instance as an airgapped corporate intranet ?
nisten 5 days ago | root | parent | next |
Great do I have to setup LDAP , oauth, and troubleshoot corporate-style single-signon systems for the next 6 months just to get a chat server running now....
elfprince13 5 days ago | root | parent | prev |
I think if you replaced the plc directory with a corporate domain that would be pretty straightforward?
nisten 5 days ago | prev | next |
Is the actual guide just this <400 word thing, or is it all those 15 different links on the post, or only some of them....
Does that... bureaucracy of documentation not infuriate anyone else or is it just me. I guess I'll try and reset my password to bluesky website, assuming it's this .app one, but then it's asking me to maybe select a provider ... of my password.
Does whoemever made this user experience not have enough emotional intelligence realize how infuriating it is?
__justplaying 5 days ago | root | parent | next |
This was a quick and dirty post I put together primarily for people who are already on Bluesky and have dev experience, and peppered with appropriate links where you have actual guides and/or documentation for each bit.
steveklabnik 5 days ago | root | parent | prev |
> I guess I'll try and reset my password to bluesky website, assuming it's this .app one, but then it's asking me to maybe select a provider ... of my password.
It's asking what the host of your data is. If you're not running your own server, then the default value of Bluesky itself is the correct one.
__justplaying 6 days ago | prev | next |
How do I ask the mods to swap out the link to the actual post instead of my blog's front page?
(...also, the title, as the original has the caveat)
Jtsummers 5 days ago | root | parent | next |
It's likely the correct page was submitted. The correct page includes a canonical link in the HTML:
<link rel="canonical" href="https://alice.bsky.sh"/>
HN will replace submission links with the canonical link if it's found.__justplaying 5 days ago | root | parent |
oh. time to look at the code of my blog...
paulgb 6 days ago | root | parent | prev | next |
@dang a better URL would be https://alice.bsky.sh/post/3laega7icmi2q
(I can't tell if Dan has an alert set up on his handle or whether he just sees everything, but hopefully that works :))
yorwba 5 days ago | root | parent | next |
dang doesn't have an alert and he doesn't see everything. https://news.ycombinator.com/item?id=41317232 The official way to contact the mods is in the footer, i.e. email hn@ycombinator.com
paulgb 5 days ago | root | parent | next |
Ah thanks, good to know. I guess I've just been lucky with it and developed a superstition that it works.
timerol 5 days ago | root | parent |
He is also extremely active here, so there's a good chance he reads and responds to a random comment without an email. But email is the approved (and fastest) way to go about it
__justplaying 5 days ago | root | parent | prev |
will email, thanks
__justplaying 6 days ago | root | parent | prev |
thanks!
dang 5 days ago | root | parent | prev |
Fixed now!
zxcvbnm69 5 days ago | prev | next |
[dead]
elfprince13 5 days ago | prev | next |
but I thought that Bluesky wasn't meaningfully distributed /s
jazzyjackson 4 days ago | root | parent |
If you thought, past tense, you were probably right, but it's been in the oven for 3 years so it's finally approaching "fully baked"
jonstaab 5 days ago | prev |
[flagged]
timerol 5 days ago | root | parent | next |
I'm sure there are HNers who built desktops with 8TB or 16TB hard drives, and have not (yet) needed the space for as many games and media as expected.
numpad0 5 days ago | root | parent | prev | next |
8TB WD CMR is like $99, 2x48GB of DDR5 is ~$250. Memory and storage are currently way cheaper than many think it is.
__justplaying 5 days ago | root | parent | prev | next |
didn't say it was cheap!
nightpool 5 days ago | root | parent | next |
But why is it required? Do you really need a copy of everyone's data locally? If the only way to self-host bluesky is to have an entire copy of the entire database, that seems like it's really bad from a scaling perspective.
half-kh-hacker 5 days ago | root | parent | next |
What else would "self-hosting all of Bluesky" mean other than a copy of the entire site? If you just want to participate in the network host a PDS, which only stores your own posts.
nightpool 5 days ago | root | parent |
Surely there's some middle ground between only hosting your own data and being reliant on another site to keep track of your following / followers and hosting a duplicate copy of the entire network?
steveklabnik 5 days ago | root | parent | next |
For sure. If you just want to host your own data, you can do that. A PDS for you and maybe some friends is very small and cheap to host.
nightpool 5 days ago | root | parent |
My understanding though is that having a PDS on its own is useless without an AppView to collect the data from the relay? Or am I misunderstanding the architecture here? https://docs.bsky.app/docs/advanced-guides/federation-archit...
steveklabnik 5 days ago | root | parent |
I'm talking about the case where you wanted to run your own PDS and use all of the other infrastructure being run by Bluesky.
If you fully want your own copy of everything, then you'd want to run a copy of everything. But you don't have to. It really depends on what your goals are. That's why the post is about the maximal scenario. "Just your own PDS" is the minimalist scenario. But I think it's the one that makes sense for 95% of users who want to self-host.
nightpool 5 days ago | root | parent |
Right, and I'm saying "surely there must be a middle ground between "using all of Bluesky's infrastructure" and "having a 4.5tb copy of every post ever made on the network""
lisowski 5 days ago | root | parent | next |
What exactly would that be?
I feel like the middle ground your talking about could be just a feed?
A feed is: a server that consumes the firehose and decided on whether to store posts, when loaded in the app it returns some post to create a feed
So essentially you only store references to part of the network rather than storing the whole thing
jonstaab 5 days ago | root | parent | prev |
consider the nostr protocol
half-kh-hacker 5 days ago | root | parent | prev | next |
Your following list is stored in your own repo, so it lives on your PDS. You can theoretically have partial replicas of the network but nobody has bothered yet; if you want to make software like that, a good start would be subscribing to the firehose and filtering down to DIDs you care about / supplying the watched DIDs parameter to a Jetstream instance
fiatjaf 5 days ago | root | parent | prev |
The middle ground you're looking for is impossible in the AT protocol, it is however what the Nostr protocol is aiming towards.
jazzyjackson 5 days ago | root | parent | prev | next |
"self host an entire copy of all user data" is a pretty cool capability to have, kind of proof that the infrastructure is really open and forkable. you seem to have misunderstood OPs goals. Serving your own data from a personal data server is a much less arduous affair.
galactus 5 days ago | root | parent | prev |
Uh, it is not required. You can run only a PDS if you want to self host your data and everything will work.
But it is indeed very cool that you can actually host a relay if you want (for fun, learning, or whatever reason)
bombcar 5 days ago | root | parent | prev |
Ten terabytes of spinning rust is only $100-$300 or so, that's not bad at all.
jonstaab 5 days ago | root | parent |
My point is not the current size, it's the eventual size if bluesky succeeds. Facebook ingests 100TB/day. Self-hosting a bluesky relay isn't (won't be) a thing.
galactus 5 days ago | root | parent |
It could be a thing. Not for individual tinkerers but for companies. The fact that today, with already 14 million users, is still possible for an individual to host it is amazing.
5 days ago | root | parent | prev |
zzyzxd 5 days ago | next |
Selfhosting is my hobby but I am also an SRE. I am hesitant to do this because the instruction is "too easy" -- "Simply open your firewall, download and run this installer.sh with sudo on your server and that's it!"[1].
How do I secure the webserver and the data? Where is the data on my disk? How to backup and restore? High availability?
There might be detailed documentation somewhere, or I can even read the code. But these are the important things an open source software should tell its users right off the bat.
1: https://github.com/bluesky-social/pds/blob/main/README.md