078: Uncloud, bridging the gap between Docker and Kubernetes

Dominic:

Hey there. Hello, Gophers. I miss I miss the intro. I'm Dominic StPierre.

Morten:

Hi, and I'm Morten.

Dominic:

And we have Pasha Sviderski today. So hello? Hello there from Australia.

Pasha:

Hey, guys.

Dominic:

So you are the creator of Uncloud. I came across the project. I think it was on Blue Sky, a couple of weeks ago or something like that. I was not aware of it. So very nice, and we will we will dive deep into, into this.

Dominic:

But first of all, you know, can you can you go ahead and and, you know, talk about you a little bit where where you're coming from in terms of, where where have you discovered Go? What is a little bit of your background? Anything that you want to share, you know, regarding your path so far in in programming?

Pasha:

Yeah. Good. So, yeah, I'm software developer from Belarus, currently living in Gold Coast, Australia. My background is mainly infra and backend development. Some recent jobs I worked with Canva and JetBrains.

Pasha:

At Canva I mainly maintained the compute infrastructure of clusters first on AWS ECS, then led the adoption of Kubernetes, and then probably over the last year of my work there I helped develop the internal platform, the PaaS or internal development platform, like different names of this stuff at companies. And we developed this on top of Kubernetes with a mix of cross play and Terraform and all other stuff. This is probably the time when writing Go became my day job. Before that it was mainly Python, some scripting and just operations in the infospace. So yeah, I'm not I haven't been using Go for time, maybe just a couple, three years or something, but have strong background in just software engineering.

Pasha:

Cool. Yeah.

Dominic:

JetBrains. Oh, yeah. So I'm I'm curious a little bit about that. So so I I guess that was the the Python part?

Pasha:

Yeah. That was the Python part. So I worked in a sort of small startup within JetBrains. So the JetBrains is famous for its IDs and all other dev tools. But this was something similar to Coursera, Udemy, like a marketplace for online courses.

Pasha:

So it's just a backend developer and like fifty fifty back end developer in Python and in infra infra guy.

Dominic:

Nice. Interesting. Okay. And I've I've heard a lot of Kubernetes and configuration and things like that. So would you say that this kind of sparked the idea around on cloud at some point?

Dominic:

So what what when, what happened there? You know, when have you gone from you know what? It's nice that we have all all this on, you know, either a platform as a service on the cloud provider and the idea of, you know what? Maybe it's time to build something for self hosting and having a PaaS more separate from the cloud.

Pasha:

Yeah, so probably the last job of building a PaaS on top of Kubernetes sort of led me to think in my life decisions. The real reason is probably that I'm intolerant to unnecessary complexity and over engineering. And the Kubernetes ecosystem and cloud native landscape is a great example of this. I've built in for many projects, products of different scale and for most of the cases Kubernetes has more problems than it solves. Many people agree to this statement but then ask a fair question: So what instead?

Pasha:

What's the alternative? That is the actual problem. There are no decent alternatives, unfortunately. I'm not talking about cloud vendor solutions there are plenty, right? And Kubernetes undoubtedly won this space and everyone is building just on top of it, adding more and more obstructions and indirection layers and again increasing the complexity.

Pasha:

Perhaps I just wanted to scratch my own itch as it usually happens and have a simpler tool that basically connects a handful of Docker hosts and deploys my containerized apps to them without going the full Kubernetes path. But regarding alternatives like CloudFoundries dead, Mesos is dead, HashiCorp has been sold to IBM, and the future of Nomad is not bright. Docker Swarm has been sold to Merentius many years ago and it's in a weird limbo state at the moment. I haven't seen much development and inventions on the swarm side for years, even though it sort of still works. But there is a ton of open issues.

Pasha:

So, yeah, let's just bring us to, like well, let's try to do something about it. And so, yeah, I just started experimenting and and cloud is at some stage of this experiment, which going pretty well.

Dominic:

Very cool. Are you are you are you saying that IBM is is not a good I I I won't go there. That's why I tried to do to do to do a joke, but I I won't go there. It's it's refreshing to hear. I will I will be frank.

Dominic:

So myself, I I I've been I've been, you know, going away from from, you know, cloud infrastructure for a long time now. But at first, I was I was all in in there. I I started interestingly, I I was a dot net developer in other in another life. And there was this this pass at first, which was called App Harbor. And their their h one was Azure's done right.

Dominic:

So it was when Azure was starting. So they they they were they were the hero coup of for dot net developer, if you will. Mhmm. But that's interesting to hear. So would you say, you know, would you say that what?

Dominic:

The the the weight of configuring configuring everything and what can can you, can you pinpoint a couple of problems that you you can remember maybe vividly that you faced that just did not really made sense at some point and you, you know you know, what what caused this way of thinking? You know, the anything anything that you can pinpoint?

Pasha:

Well, you mean some like, pinpoint the problems with Kubernetes specifically that was Yeah.

Dominic:

Yeah. Maybe. Or or or just in general, you know, what what what kind of the solution was was looking like? Because you you you were saying that you were kind of developing a in house, if if you will, platform as a service, which from what I can understand is is kind of a huge, a huge deal. So I can understand that, there has there has to be some things that happen that I don't know that that you you said, I I if I were to restart, I would not do that this way.

Pasha:

Yes. I guess we can maybe talk a bit more about push and pull deployment model. That's probably where the distinction or pain comes from. Maybe ten-fifteen years ago we were all deploying SSH into the server, uploading binaries. Many people still do that and it's fine.

Pasha:

Or maybe even earlier it was just putting PHP files on FTP and that was the deployment process. And then at some point we invented Ansible and this allowed us to automate or script these deployment processes to make them repeatable, reproducible. And the push deployment model works best for not so big scale and it has very good visibility and it allows to very well see when something fails you can quickly see what went wrong, like what stage or what step some command failed. And then with Kubernetes and other container orchestrators we invented I'm not sure if it's correct to say like a pull model or maybe like a reconciliation loop model, these sort of things when you define your you declare your desired state and then the system somehow makes it happen or reconciles it to this state. And in this model, this model works very well for large scale.

Pasha:

Most of the times, but large scale I'm considering thousands of nodes, for example, tens of thousands of containers or applications, it's probably impractical to orchestrate or to manage applications in an imperative way for such scale. So this orchestration model like Kubernetes works pretty well here. But the downside is it's the visibility and possibility to troubleshoot such a system is really bad. So, we have to invent all these monitoring tools, events, logins to just understand how the system works. And when you just push your manifest with your application to the cluster, then something happened, something failed, and you have no idea where to even start to dig into that.

Pasha:

Probably this is what I became sick of. When we built this platform on top of Kubernetes, we added even more abstraction layers on top. We used this reconciliation loop to provision not only Kubernetes applications but also to manage all sorts of infrastructure resources like provision virtual machine, provision S3 bucket, provision IAM policy in AWS, and then work out dependencies between these resources. And finally, the user interface is sort of like they put short YAML file with the application and everything just happened. And this worked brilliantly when when it works.

Pasha:

Yeah. But then when it doesn't work, it's just a nightmare to troubleshoot. There are so many indirection layers many GitOps stuff involved. It's just a nightmare for users, for operators, for other teams who build this thing. So that's where I'm coming from to like, okay, probably enough of it and I want something simpler.

Pasha:

I also had some Kubernetes or had some Kubernetes cluster for my home network, home infrastructure. And every time I had to do something with it, I usually started with a big sigh and like, How to do this again? You open the manifest and it's like, Oh, I already forgot what I put here and I need to figure out where to start. Probably LLMs help now. I started it just at the beginning when AI and LLMs start to become popular.

Pasha:

But still complexity doesn't go away, it just makes it easier for you to configure to understand these YAML files, but all the other stuff is still there.

Morten:

Totally. Press the magic button and hope it outputs something that works in a big system. It's not always fun. Exactly.

Dominic:

Well, what I okay. And and let's let's let's start to dive a little bit on on cloud. I really like the name, by the way. But why go? So have you have you had to take a decision at some point?

Dominic:

Maybe have have you Maybe Rust or anything like that? Why go?

Pasha:

For me it was an obvious decision. It's just standard de facto language for infrastructure tools. Most of the tools that I worked with were written in Go. And I used Go in the last couple years at my previous job. So it was just obvious decision for me.

Pasha:

And I just like the static binary, the way to just build one binary and plot and run it. And then probably on the concurrency part of the language also helps a lot with building what I'm building. As we're running an agent or daemon each machine, it runs several controllers or control loops. Yeah, this concurrent model with coroutines helps a lot.

Dominic:

Yeah. That's that sounds good. So, yeah, I was I was going to ask and I was going to ask about the agent. So so let let's start with the beginning. So how would you how would you describe Uncloud and, you know, for someone that never heard of it?

Pasha:

So in a nutshell, Uncloud is Docker Compose for production. Let's put it that way. That's the shortest description I came up with recently. Not sure if it's well described, but if you're familiar with Docker Compose that runs on one machine or you can use it locally for local development for running a bunch of containers together. It's an open source tool that gives you multi machine deployments, rolling updates, overlay network, service discovery, load balancing without the complexity of traditional container orchestrators such as Kubernetes.

Pasha:

You put your service definition configuration in a familiar Docker Compose format. Well, if you're familiar with it, but it's much much simpler than Kubernetes manifests if you want to compare it to Kubernetes. So, write a compose. Yml file, then run one deploy command locally and it builds and rolls out your application to your servers. So your servers basically self host your application.

Pasha:

You set up your own machines, it could be cloud instances, it could be on premise, and you can mix and match them.

Dominic:

So what about the machine itself? So let's say let's say I I installed the tool on my development machine, and now I have I have three servers. You you seems to be you you seems to be using the the three servers example a lot. Mhmm. What happened when I when I add a machine?

Dominic:

My main question is regarding load balancing. I'm I'm very, very curious about load balancing. How who's who's deciding which of the machine is going to have the HTTPS, termination or or the the entry point to my service, for example?

Pasha:

So yeah. Yeah. So let's let's try to go through the setup steps. You run the machine init command or machine add command to initialize the machine or initialize the first machine or add another machine to the cluster. This command installs a daemon on the machine and connects to establish WireGuard tunnels and connections to other machines.

Pasha:

So every new machine that is added to the cluster is added to the overlay WireGuard mesh between these machines. And we also install CADI as a reverse proxy. This is for now just opinionated decision. CADI is used as the only option. Probably some other reverse proxy could be supported in the future but now just for simplicity it's only KADI.

Pasha:

And KADI is installed every machine. But you can also configure which specific machines you want to run. By default it just runs on every machine every machine could be an entry point to the cluster or to services. You can create a DNS record for your public domain, point it to one of the machines with a public IP and it will so Klicadi will will issue a certificate automatically and will reverse proxy the request to to services.

Dominic:

Okay. So basic basically, it's the you know, basically, it's it's the public DNS that kind of decide which which is the the elected machine that will receive the the first request. And from there, the load balancing is the is done with CADI.

Pasha:

So KADI is ingress or reverse proxy part. You can front it with a load balancer like a cloud load balancer for example And then this is independently configured of the Uncloud setup. So you can create a load balancer, create two targets, three targets if you have three machines and then the traffic will be load balanced by this load balancer to multiple CUDA instances. And then each CUDA instance will reverse proxy to a specific service container. So you can you can point DNS or you can point load balancer.

Pasha:

So it's it's it's up to you. So there is no sort of requirement or like that. It's not opinionated at at this part.

Morten:

So you you you would have something like, I don't know, Cloudflare's or Load Balancer in front if you want to load balance between each target. Is that correctly understood?

Pasha:

Yeah. Yep.

Morten:

Yep. Cool. I really like your design, by the way, with the on cloud daemon. That's I'm I'm doing a little bit of a similar project, and I chose a a similar path. It really is is cool to see that you also went with the with the on cloud daemon.

Morten:

I was very curious if you was actually load baling on a per daemon basis since you had the private network, but cool. That sounds really cool.

Pasha:

Yeah. The daemon is mainly needed for well, we can probably dig dig deeper into this, but it's needed for coordinating this network and stuff, sharing information about running containers for service discovery and things like that. And also for connecting client to from connecting your client, like your CLI commands to the cluster. So this is the main purpose of the daemon. But then when, like, for user traffic, it doesn't take part in any request serving.

Pasha:

So the request served by, like, Kadi and the application containers.

Morten:

That's cool. And do you do any like, so do you do any different kind of, like, deployments methods or use it's mainly rolling updates and then you handle that through the the KADI API, the admin API? Or or how how how do you tackle that?

Pasha:

So, yeah, the moment we implemented rolling update. Mhmm. It's basically just rolling one container at a time, waiting for a new container to start, pass the health checks, register in CADI, and then when it's healthy we can remove the old container and then repeat this process with the next container until all containers are rolled out. This basic role in deployment is similar to deployments in Kubernetes. I thought about blue green and immediate deployments, things like that.

Pasha:

But for now it's just one strategy for simplicity. We'll see what's needed in the future.

Morten:

That's nice. Yeah. I I I have I have immediate and blue green in my product, and I don't have rolling, but I had a plan. So it's funny that you mentioned it. I I basically did it with the admin API for Kedi, which is very it's very cool, and I just used the low spelling feature.

Morten:

So it's actually fairly simple to implement. Yeah. Kedi is a really cool project.

Pasha:

Yeah. What load balancer feature do you use? I'm not sure.

Morten:

If you oh, sorry. Yeah. You they they expose, like, an admin API, so you can actually load balance inside of Kedi to different targets. So so you can just manage that through the through the through the admin API, and then you can change the weights on each target that you want it that you want Kedi to load balance to is not really the right word, right? But it can distribute incoming traffic to different targets on the server, and then you can manage those weights with the with the admin API, and then you don't need to do a, like, a KD reload or anything to to to have those be applied.

Pasha:

Yeah. Right. So this is so the load balancer you're talking about is part of the reverse proxy stanza or this is not the reverse proxy. I remember we used Load Balancer passive health checks, I recall correctly, the reverse proxy. But you're talking about probably something different or not.

Morten:

Maybe I'm explaining it badly. Are you familiar with the admin API that Kedi exposes? Yeah.

Pasha:

Yeah. Yes. Of course. Yeah.

Morten:

And they

Pasha:

have to rely on the context.

Morten:

Yeah. Yeah. Of course. Okay. So they're, like, they just have a what is it called, l b weighted policy underneath.

Morten:

So you can you can set weights on each target that you want after the reverse proxy happens, like how the container sorry, how KD then sends out traffic to each targets on the server.

Dominic:

Yeah. It's similar to NGINX. NGINX also does that. It's it's it's like local balancing. That's why I was asking dad that question at first.

Dominic:

I was curious to to see if if one of the nodes was the the primary node or something like that that were using the reverse proxy to do the load balancing.

Pasha:

Yeah. I'll just check the yeah. I know what you're talking about. Yeah. Just to to to answer the question if there is some primary node or not, It doesn't matter, every node with a running CAD can work as an increase.

Pasha:

It's up to you to decide which nodes you want to use for receiving traffic from the internet or from your private network. All Kadis are just configured equally.

Dominic:

Yeah. But you still need an external load balancer to spread out your request to

Pasha:

Yeah. Of course. Yeah. You have to front this with either probably the DNS, we mentioned DNS. For example, Cloudflare DNS has a proxy feature.

Pasha:

You can enable just DNS or proxy. And the proxy feature is what I call a poor man's load balancer. So if you create multiple A records your domain and enable proxy for each of them. Will do automatically will run actually, it won't run, Robin. It will connect to the closest one depending on the user.

Pasha:

So which edge node the user connects to, and it will route to the closest one from the, like, Cloudflare edge server. But then if that server is down, it will automatically fall back to the other one. So it will retry on the Cloudflare side. So the user won't even get, like, 500 or anything. It will it will automatically retry.

Pasha:

And if it's successful, then a user will receive a response from from from another

Dominic:

server. Are they are they marking the the server as as not clear for for a couple of minutes or or the the next request it will it will still try try it again and and fail and now fall back?

Pasha:

To be not to be honest, I'm not sure. This isn't clearly described anywhere. So I just tested it. It works like I described and it and they and they have the this in docs that it will fall back on five x x errors. So, yeah, this is this is pretty good.

Pasha:

And it's free. Like, I'm not sure, like, whether what to what amount of traffic it's free. But Yeah. Yeah, for for all my stuff, it just works

Dominic:

well. Yeah. Based on what I heard of CloudFare, it's it's free until it costs you $10. Mhmm. But still

Pasha:

That's fair. That's fair.

Dominic:

Still very, very generous free too. I'm I'm I'm just curious to return a little bit to to the network part. I'm I'm I'm very interested in that. So let's say let's say two of my or the three servers are not at all in the same in the same region and maybe even not maybe I have one server at my home, for example, one server at at a a cloud provider or whatever. So are they are they just communicating with them with via SSH?

Dominic:

Is that what I I I'm not sure about the WireGuard mesh. So is it is it like a VPN similar to Tailscale or something like that? It's just able to communicate with other servers via SSH?

Pasha:

It's not SSH. It's just WireGuard is UDP, so it's just basic UDP traffic. Tailscale uses Wireguard under the hood. In Uncloud we use Vanilla Wireguard. How servers connect?

Pasha:

It's basically like a VPN. We create a virtual private network between these machines that are located different networks, probably in different regions behind the net. The current requirement so that all machines communicate successfully is that for each pair of machines any of the machines in the pair have to be able to reach the other machine. So maybe it's a bit hard to comprehend, but basically you can have one public machine in the cloud and several machines in your private home network behind NAT. This setup will work perfectly.

Pasha:

So each pair of machines within your private network can directly communicate to each other. And when we take one cloud machine and any other machine in your private network then they also can communicate because your home machine can reach the public machine by its IP. Basically, we just establish a full mesh, so each machine establishes a WireGuard tunnel to every other machine. This is like a full mesh topology. And this forms an overlay network so then machines and containers can talk over this network to each other.

Dominic:

Very nice. So what, UDP is always open? It's a closed socket in any firewall and whatnot? No, that's I need to open that?

Pasha:

Yeah. Yeah. You you need to open, like, five

Dominic:

It's open.

Pasha:

Fifty fifty one eight twenty UDP port. So, yeah, this is this is WireGuard. Yeah.

Dominic:

Oh, nice. Oh, very interesting. Okay. Okay. Okay.

Dominic:

So so okay. It's it's starting to make sense. So when I issue a a I think it's a it's the run command to deploy. Right? Do you see you see run is is doing what exactly?

Dominic:

Is it is it creating a a Docker container on my machine and sending that to to the agent, or it it's sending a command to I don't know. How how does it transfer my service to all my servers?

Pasha:

Okay. So deploy commands can build container images, so Compose specification. So just maybe to add a bit more context about Compose and why Compose. Docker Compose is just an implementation or reference implementation of ComposeSpec. But there is also ComposeSpec website where it just basically defines all the parameters attributes for Compose file and you can essentially implement your own implementation of specification.

Pasha:

There is a Customize, for example, or not customized, Compose with K for Kubernetes. So basically to support Compose format to deploy to Kubernetes. Basically, Uncloud doesn't use Docker Compose directly but just uses its Compose spec. At the moment it implements only a subset of features or attributes, not all of them. Over time it just adds what makes sense.

Pasha:

Compose supports build stanza, a build block in the definition where you can define, use this Dockerfile, use this context basically like configures Docker build process like what platforms to build for. So if you want to build your application code you can specify this build block and then specify some deployment configuration, what memory you want to use, what port to publish. Then the deploy command if it finds the build block it builds a container image. Then it uploads this container image directly to your remote servers where you want to run the service. So, part, I think, is quite different to most of the deployment tools and orchestrators.

Pasha:

Usually, the next step is to upload to Container Registry, but we intentionally eliminated this step to reduce the friction. To my experience, I'm not saying the registry is not needed, it has its place in many setups, but just for my personal use and in many ways it gets in the way often. We built another project called Unregistry, I'm not sure if you've heard of it, so we can talk about this. Basically what it allows you to do is to expose your Docker or ContainerD daemon as a registry API. So each each on cloud machine, is essentially a container registry.

Pasha:

So you can think of it like every Docker daemon has its own image storage, so it needs to pull images in order to run them. And it has internal cache, internal storage for these images. So why can't we just use this storage when we want to deploy just directly push to this storage instead of pushing to third party registry and then pulling from registry back to the server. We built the unregistered component to be able to push directly to service. We pushed the image to machines and then the deploy command starts the rolling deployment process each service according to dependencies, the order specified in the compose file.

Pasha:

You get your service deployed. I'll probably make a note about what it prints. I took some inspiration from Terraform tooling. When you want to make a change to infrastructure Terraform shows you a plan so you can see what actually will happen and you can review and approve it. On Cloud Deploy also prints a deployment plan and it prints it in the exact order what containers will be replaced, what new containers will be started, what will be removed, what volumes will be created and so on.

Pasha:

And you confirm it and then the execution of this plan happens. And you can of course auto approve it to run it on CI, for example.

Dominic:

Sounds, very, very cool. I I was not aware about the unregistered part. Does that mean that the artifact that we are sending is is kind of big? So I I guess if the it's all built locally and sent, like 500 megabytes that we are sending each time we are deploying?

Pasha:

Yeah, that's correct, but that's actually why we built this layer, the registry part, to not send the full artifact every time, similar to images. When you're deploying for the first time, yes, you're transferring the whole image, or maybe except the base image. But then if you're just rebuilding the last couple layers with just your application it's usually not big and this allows you to just upload the changed layers.

Dominic:

Wow, that is so interesting. So, wow, okay. Without without entering into too much detail because I'm I'm kind of very intrigued now. So if there is some parts that will that will exist, I I guess and and, you know, that that might that might show how how little I know about this this, this thing because I'm not sure what exactly it means to build a Docker image in terms of the binary that you're receiving. But let's say let's say you you have the base system.

Dominic:

So it means that on the on cloud machines, I would have somehow the the basics the basis system built there. And now I'm only sending what changed from my from my service. And somehow, it's able to merge that together. I'm I'm I'm I'm not super clear about because from from my point of view, I thought when we do a Docker build, it it just build everything from from scratch. I know there is some caching and whatnot, I'm just unclear about what gets cached actually.

Dominic:

Do you know that?

Pasha:

Yes, probably a lot to unpack but to be concise. When you build images locally with just Docker build it has in built cache system. So, it checks if you didn't change your Dockerfile and the files on the file system haven't been changed then some of the run commands in the Dockerfile they're just coming from cache. So basically each run command in the Dockerfile that does some build, copies files or things like that This is a separate layer in the resulting image, the resulting artifact. So the image consists of multiple layers and each layer is usually created by one command in the Dockerfile.

Pasha:

When you built in locally and you didn't change much, so maybe half of your Dockerfile is actually taken from your local build cache on your local machine. If you're building on CI then again you can use some cache on CI agents or you can use GitHub Actions cache or some other tools and even cloud services for building images to utilize the cache heavily. When you upload this image to machines, like if you're just starting with empty machines, then there is no cache on these machines. The first upload probably takes the longest, But then if you're just changing a few last layers in your images then you will just applaud them when deploying with Uncloud. It's very similar to how would you push them to contain your registry.

Pasha:

So when you push them to the registry you're also unplugging only the missing layers, only the changed layers.

Dominic:

Yeah. I'm I'm just unclear about, you know, what what what kind of what what kind of binary data are we talking about. Yeah. But but, yeah, maybe maybe it's it's way too much detail at at this moment. That that sounds great.

Dominic:

So any any challenges that you faced so far, with the development? Any any anything that, that you you thought might be easier and and and it it got complicated and things like that? Where Where are you at the moment with all this?

Pasha:

Probably maybe not the technical challenge but the main challenge is to keep things rolling. Working out a sustainable business model is the primary concern and the primary challenge. This part I haven't worked out yet but there are already some users, there are some people who adopt it on cloud for their side projects or home labs. The goal is to make it more usable for serious businesses, for solar, solopreneurs, small teams. I'm just saying small teams first because not many capabilities are available for multi user support and things like that.

Pasha:

I don't want to overpromise anything and oversell here. But to make it possible so other businesses could depend on cloud and be confident and it will thrive and they'll get the support they need is probably the the biggest challenge now.

Dominic:

Yeah. Yeah. Totally. It's very hard to to do it. Yeah.

Dominic:

It's it's akin to marketing in in software as a service. I I I can it might be might be worse for open source project at actually, the sometimes.

Pasha:

Yeah. And especially, you know, in the like, we're covering the self hosted space so that the the whole purpose is to run it on your own servers, on your own infra, right? So you it's it's kinda it is called Uncloud, so you're like, what what what will I propose? Like, Uncloud cloud? It's like it's silly.

Dominic:

What do you mean? So GCP is not going to found you. Is that is that what you're saying?

Morten:

Oh, yeah. I mean, have you have you because I can see in your docs that you also mentioned Coolify and Docploy. Have you have you played around with their pricing models or their at least their approach to to make it to funding or to to business side of things?

Pasha:

Yeah. Of course. I I, like like, keeping an eye on them but not trying to pay too much attention to just not start copying their approaches but try to be distant from their decisions. My current plan or idea is to build a control center or web UI, both self hosted and managed for managing clusters and applications, probably with some PaaS features on top of the current and cloud clusters. I guess this is similar to what they do, but I guess their whole platform or their application is this thing, so it is what drives the deployments, what manages everything.

Pasha:

I wanted to make it a little bit more composable, so this control center I'm considering it will be off the critical path, so the current clusters can steal, or machines in the clusters can talk to each other, run applications and stuff, and you can manage them with CLI. But basically CLI is one of the kind of clients to your cluster and this web component is just another kind of cluster. But it can add more features that are impractical to implement in this decentralized cluster model. Things like watching a repo and automatically build in code on pushes or stuff like that, it's not very clear how to implement the current and cloud model. But if we can have a separate component, a standalone component with web interface, this is the place where this could be implemented.

Pasha:

Still probably my main focus will likely be on day two operations and help with day two operations like providing observability, monitoring, alerts, backups, recovery. This critical and time consuming stuff that everyone wants to just work but it's so tedious to set up. So, yeah, the current hypothesis is that this is what could be provided out of the box as a managed service.

Morten:

But it makes a lot of sense. Yeah. Yeah. I I mean, here here a lot of people, they all like, the developers is getting waking up to the necessity for for telemetry data, but then you start to play with it, and you either, you know, unload your bank into Datadoc's bank account or you you fiddle with Grafana for a long time. So that that sounds like a very, I I think, feasible path forward for you guys.

Morten:

Like, a completely different setup would be nice on your own infrastructure.

Pasha:

Yeah. So we'll see if that that's something that we need to test and validate. So, yeah, this is probably the the biggest part of the work that needs to be done. This is not the very, I would say, not pleasant but exciting work to do. We're all developers, we want to develop, not market and not to do user research, but to build a business you need to do this.

Morten:

Yeah.

Dominic:

Yeah. Totally. Totally. Yeah. Returning to the the fact of having some machines on the cloud and some in in a in a home lab, for example.

Dominic:

So is it is it something that you that you are doing? Was was that something that I'm trying to to wonder what what why would I do that exactly? Is it because let let let me give you a concrete example. So I'm I'm building this app at the moment, and I know I will I will need the plausible self hosting service. And instead of buying a machine on on on some cloud provider, I could host that locally and, you use, on cloud to to now because you you were saying that there's there's some kind of DNS in there.

Dominic:

So I I could have some kind of DNS in my cloud server that calls my own machine, which would have the plausible self hosting. So since it's a very, very low impact if if it's down or not and whatnot. So what what I I'm I'm curious a little bit to hear about some use cases, of what what you have seen so far or or how you are using on cloud yourself to, to bridge some cloud machine with with maybe your own network and whatnot? Why why what are you doing with that?

Pasha:

Yeah. So I I personally combine some machines in my home network with cloud cloud machines to just run monitoring stack on the machines in my home network. Because Prometheus, Grafana, they use a lot of memory. Recently I tested VictoriaMetrix, which is just a very cool project that provides a drop in replacement for Prometheus and consumes like 10 times less system resources, which is amazing. Probably the monitoring stack using VictoriaMetrixVictoriaLocks won't consume that much, but it's still usually several gigabytes of RAM and some CPU cores.

Pasha:

And just for side projects you can run it on your own machines at home if you have some home lab setup and just not spend money on the cloud instances. But more critical workloads that require to be available then you just run them on small VMs from cloud providers, from Hetzner, DigitalOcean or whatever you prefer. This is called the hybrid setup when you just mix cloud with on premise. Typically this is just for cost saving. But in some cases, and I hear this starts to be more popular in Europe, data sovereignty is the subject that many people start talking about and also exiting the AWS clouds and running on premise or some European cloud providers.

Pasha:

So, if data sovereignty requirements is something you have then again can combine your on premise infrastructure and with with with cloud or your hosting provider infrastructure and run your apps in the hybrid model.

Dominic:

And as soon as I'm installing or at least initializing the machine, you you were saying that it install a an agent on on the on the server itself. So this server, if I were to SSH on there and ping another machine, it it would work. So the the DNS is is also outside of on cloud, you know, systems, if I can say that? Is is the OS knows about the DNS internally in in the the network?

Pasha:

So the you will be able to ping the machine if the, the WireGuard VPN is able to establish successfully. But the DNS part, I'm not sure. Can you clarify what specific DNS? Because the Uncloud cluster also has its own internal DNS for the internal network and it allows to resolve other services like resolve service names to the container IP addresses to talk to them. This is similar to Kubernetes.

Pasha:

You're familiar, that also has an internal core DNS component. But I

Dominic:

guess So it's really inside it's really inside of of the of of the network. So if I if I'm s s hing, I cannot use that DNS. So let's say I have, I don't know, for example, server server a and server b would be my DNS name, for example. Server a is could could be my cloud machine, server b my my own machine. Can I SSH into server a and do ping server b?

Pasha:

So and and server b is a is a public DNS record, or or what is server b?

Dominic:

It would it would would be using the the internal on cloud DNS. So I'm I'm I'm trying to I'm trying to to determine if the DNS that is used internally is also able or available outside of don't I don't know how to explain that. If I SSH on the machine with the the Uncloud agent, do I automatically have all the DNS result for me from from my other machines? Okay. Let let me let me ask that let me ask that so so another way.

Dominic:

So I have a service. Let's say I have a a web service and for for for, you know, just for the sake of discussion, it's it's hosted on the cloud, and I I want the database to be hosted on my own machine, for example.

Pasha:

Cool.

Dominic:

So I could have my my web instance DNS be named web and my database just DB. If I SSH into the web server, can I do I do anything with the DB, you know, on cloud DNS name at all?

Pasha:

Yeah. Yeah.

Dominic:

Got it. Will it result is result resolved to the my home IP? Mhmm.

Pasha:

Yeah. So got it. So On the machine when you just SSH it won't automatically resolve because we run the internal DNS on basically every agent. This runs an embedded, very tiny embedded DNS server and it listens on the local cost(fifty three) port. We don't automatically update your etcresolveconf to point to this domain server.

Pasha:

But you can manually update and then from the machine you will be able to just pin DB and it will pin the container with your DB on another machine. But when you're running another service in an Uncloud service and within this service you don't need to configure DNS, it automatically already points to this embedded DNS that resolves all your services running in the cluster but just their name. So you can, like in your application, you can directly specify the host name for your database just DB. And it uses like internal DB dot internal or internal you can just omit and use just DB and it will resolve to your DB container.

Dominic:

Okay. Okay. Nice. Do do I have do I have anything with that agent that would allow me to open open a shell or something that would automatically use this DNS resolver?

Pasha:

You can open a shell in any of your running containers. So, and then in this shell you will be able to use the DNS. So you can, for example, you can deploy your web application. And if you have, like, if it's built from Alpine or Ubuntu image that contains shell then you can just use the exact your application, your service name and then it will open the shell inside the application container. And it will be able to use any DNS records there.

Dominic:

Okay. That's interesting. And and and to be clear, there's no need for me to have Docker on the server first?

Pasha:

No. It's actually, you can install it yourself, but when you run this machine init or machine add command, the part of the install script is installing Docker on your machine. So it will be installed for you if it's if it's not installed yet, but you can preinstall yourself your your way or you configure it your way you prefer.

Dominic:

Interesting. Morten, do you have any any closing thoughts? Something something to ask?

Morten:

Yeah. I mean, I was kinda thinking if you had any, like, deployment mechanisms of of updates to the agents, but I I I don't know if you already answered that, to be honest. So so do do you is there any, like, updating Revolve with the with the daemon running, or or is that is that feature complete, I guess, is my my question. Are you just are you happy and and you don't need any updates?

Pasha:

So you mean when I release a new version of agents of daemons and CLI, how they roll out? Yeah, there's actually the open, pull request that contributes part of it to create a new command to basically do the role and update of agents on the machines. It's not implemented at the moment and it's just in every release notes I just include a snippet you just copy paste and run on every machine to just download a new binary already pre built binary of a new version from the latest release on GitHub and basically just put it, replace the currently running one and then restart the Systemd service. That's basically it. Nice.

Pasha:

Cool. But definitely it would be great to automate it, to make it as a new version is available and you just run one CLI command and roll out the version. Yeah. But this is just, like, nice to have. It's probably not so critical.

Dominic:

Very cool. Very cool project, Pasha. Do you do you do you anything anything you you want to say to listeners? Do you are you looking for any contribution? Any feedback?

Dominic:

I I I would I would guess is is what you are looking for at the moment. You you know, people using the tool and whatnot. So what what what are how people can help?

Pasha:

I would love feedback from users, so check it out please. You can find my GitHub profile P Sviderski, basically P and my surname. There are a couple of projects pinned there, one is Uncloud, another Unregistry. It out. Unregistry is the one you can also use without Uncloud.

Pasha:

It comes with a standalone Docker plugin, just a separate command called push, but push with double s for SSH. Basically you can run docker push plus ssh and then your image name and your SSH target and it will upload an image to your remote server via SSH. It just does a little bit of manipulation with running unregistered container just in time and then clean it up. Basically it will upload only missing layers to your remote machine which will be available for running containers on that machine. Check it out and please join our Discord.

Pasha:

It's around three fifty people at the moment there. Not a lot, but it's still something. Probably the feedback is the most what want to hear from people how they use it because when it's not SaaS and just open source project you have no idea how people use it and you only know when they come to you with their issues. And then you realize oh, this guy does that. But probably many others just faced some problems and gave up and you don't know where people dropped.

Pasha:

So please let me know if you found something that doesn't work or you have ideas how to do things differently or you have some bug or you face some bug so please let me know.

Dominic:

Very cool project. Thank you for joining. If you want to return in a couple of months, three, four months, whatever, when when you reach some some other milestone, it will it will be nice to to get some news from the project. It seems seems to be very interesting.

Pasha:

Yeah. That would be great. Yeah. Thanks for having me.

Morten:

Yeah. Thanks for coming.

Pasha:

Thank you, guys.

Creators and Guests

Dominic St-Pierre
Host
Dominic St-Pierre
Go system builder - entrepreneur. I've been writing software systems since 2001. I love SaaS, building since 2008.
Morten Vistisen
Host
Morten Vistisen
Contract Full-Stack Developer at mbv labs.
Pasha Sviderski
Guest
Pasha Sviderski
Senior Tinkerer, building Uncloud. Ex: @Canva, @JetBrains, Wargaming
078: Uncloud, bridging the gap between Docker and Kubernetes
Broadcast by