What is it?

Fly.io offers a partially managed Postgres cluster on their infrastructure (now called Legacy Postgres because they also provide a fully managed service). With their CLI and web dashboard you can provision a cluster quickly, with automatic snapshots, metrics, and replication set up for you.

We had trouble finding a good database provider. AWS RDS was really too expensive—if you want any replication, the EC2 costs add up fast. Neon is fun to use and cheaper, but we ran into scalability issues (we don’t store that much data; it was probably a mix of their S3-backed architecture and our unoptimized queries). Fly.io feels like a solid middle ground: a battle-tested template, reasonably priced VMs, and really nice infrastructure (e.g., WireGuard tunnel to connect the new Postgres cluster to our AWS cloud without exposing anything to the public internet).

How do I connect to the cluster though?

Fly’s own docs say:

  • 5432 → always connects to the primary instance
  • 5433 → connects directly to the Postgres server

so I setup my application like this:

  • The app accepts two DB URLs, DB_RW_URL and DB_RO_URL. The RW URL uses 5432; the RO URL uses 5433.
  • Both URLs point to a pod running in the AWS-side Kubernetes cluster.
  • The pod uses socat over the WireGuard tunnel to reach the database via Fly’s internal domain name, e.g. socat TCP-LISTEN:5433,fork TCP:db.internal:5433

Looks fine, what’s the problem?

If it works fine, obviously we would not be writing a blog post today. I only recently noticed that I have NOT been using the read-replicas at all. All my connections had been going to the primary instance only. I spent some time digging into this and this community post finally clues me in on what might be happening. The IP resolved from the domain name, while including all instances, it is always sorted the same, and the primary instance is always the first one, therefore we always make the connection to the primary instance only.

What’s the solution?

The solution I decided with is taking the inspiration on how the port 5432 works, fly.io ships a haproxy config in the postgres-flex repo that resolves the same domain name and find the primary instance and proxy the connection over. We can do the exact same thing, except to find the replica roles.

I updated the haproxy.cfg to be the following:

global
  maxconn 1000
  stats socket /run/haproxy/haproxy.sock mode 660 level admin
  stats timeout 2m # Wait up to 2 minutes for input
 
defaults
  log     global
  mode    tcp
  retries 2
  timeout client 30m
  timeout connect 4s
  timeout server 30m
  timeout check 5s
 
resolvers flydns
  nameserver dns1 [fdaa::3]:53
  accepted_payload_size 8192 # allow larger DNS payloads
 
frontend ft_postgresql
  mode tcp
  bind *:5432
  bind :::5432
  default_backend bk_db
 
frontend stats
  mode http
  bind :::8404
  stats enable
  stats uri /stats
  stats refresh 10s
 
backend bk_db
  balance roundrobin
  option httpchk GET /flycheck/role
  http-check expect string replica # can remove this line if it is acceptable to hit primary for read-only requests
  http-check disable-on-404
  server-template pg 10 $FLY_REGION.$DB_APP_NAME.internal:5433 check port 5500 resolvers flydns resolve-prefer ipv6 init-addr none on-marked-down shutdown-sessions

and cut down the Dockerfile to run just the HAProxy:

FROM ubuntu:24.04
 
ARG HAPROXY_VERSION=2.8
 
RUN set -eux; \
	if [ -f /etc/dpkg/dpkg.cfg.d/docker ]; then \
# if this file exists, we're likely in "debian:xxx-slim", and locales are thus being excluded so we need to remove that exclusion (since we need locales)
		grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker; \
		sed -ri '/\/usr\/share\/locale/d' /etc/dpkg/dpkg.cfg.d/docker; \
		! grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker; \
	fi; \
	apt-get update; apt-get install -y --no-install-recommends locales; rm -rf /var/lib/apt/lists/*; \
	echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen; \
	locale-gen; \
	locale -a | grep 'en_US.utf8'
ENV LANG en_US.utf8
 
RUN apt-get update && apt-get install --no-install-recommends -y \
	ca-certificates iproute2 curl bash dnsutils vim socat procps ssh gnupg \
	&& apt autoremove -y && apt clean && \
	rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 
# Haproxy
RUN apt-get update && apt-get install --no-install-recommends -y \
	haproxy=$HAPROXY_VERSION.\* \
	&& apt autoremove -y && apt clean
 
ADD haproxy.cfg /fly/haproxy.cfg
RUN mkdir -p /run/haproxy/
 
EXPOSE 5432
 
CMD ["/usr/sbin/haproxy", "-W", "-db", "-f", "/fly/haproxy.cfg"]

I run the proxy in a new app in a shared-cpu-2x 2GB machine (It would probably fit on a smaller machine) and voila, we can now point the RO URLs to this HAProxy and the read replicas are getting some traffics!