What is it?
Fly.io offers a partially managed Postgres cluster on their infrastructure (now called Legacy Postgres because they also provide a fully managed service). With their CLI and web dashboard you can provision a cluster quickly, with automatic snapshots, metrics, and replication set up for you.
We had trouble finding a good database provider. AWS RDS was really too expensive—if you want any replication, the EC2 costs add up fast. Neon is fun to use and cheaper, but we ran into scalability issues (we don’t store that much data; it was probably a mix of their S3-backed architecture and our unoptimized queries). Fly.io feels like a solid middle ground: a battle-tested template, reasonably priced VMs, and really nice infrastructure (e.g., WireGuard tunnel to connect the new Postgres cluster to our AWS cloud without exposing anything to the public internet).
How do I connect to the cluster though?
Fly’s own docs say:
- 5432 → always connects to the primary instance
- 5433 → connects directly to the Postgres server
so I setup my application like this:
- The app accepts two DB URLs,
DB_RW_URL
andDB_RO_URL
. The RW URL uses 5432; the RO URL uses 5433. - Both URLs point to a pod running in the AWS-side Kubernetes cluster.
- The pod uses
socat
over the WireGuard tunnel to reach the database via Fly’s internal domain name, e.g.socat TCP-LISTEN:5433,fork TCP:db.internal:5433
Looks fine, what’s the problem?
If it works fine, obviously we would not be writing a blog post today. I only recently noticed that I have NOT been using the read-replicas at all. All my connections had been going to the primary instance only. I spent some time digging into this and this community post finally clues me in on what might be happening. The IP resolved from the domain name, while including all instances, it is always sorted the same, and the primary instance is always the first one, therefore we always make the connection to the primary instance only.
What’s the solution?
The solution I decided with is taking the inspiration on how the port 5432 works, fly.io ships a haproxy config in the postgres-flex repo that resolves the same domain name and find the primary instance and proxy the connection over. We can do the exact same thing, except to find the replica roles.
I updated the haproxy.cfg
to be the following:
global
maxconn 1000
stats socket /run/haproxy/haproxy.sock mode 660 level admin
stats timeout 2m # Wait up to 2 minutes for input
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
resolvers flydns
nameserver dns1 [fdaa::3]:53
accepted_payload_size 8192 # allow larger DNS payloads
frontend ft_postgresql
mode tcp
bind *:5432
bind :::5432
default_backend bk_db
frontend stats
mode http
bind :::8404
stats enable
stats uri /stats
stats refresh 10s
backend bk_db
balance roundrobin
option httpchk GET /flycheck/role
http-check expect string replica # can remove this line if it is acceptable to hit primary for read-only requests
http-check disable-on-404
server-template pg 10 $FLY_REGION.$DB_APP_NAME.internal:5433 check port 5500 resolvers flydns resolve-prefer ipv6 init-addr none on-marked-down shutdown-sessions
and cut down the Dockerfile
to run just the HAProxy:
FROM ubuntu:24.04
ARG HAPROXY_VERSION=2.8
RUN set -eux; \
if [ -f /etc/dpkg/dpkg.cfg.d/docker ]; then \
# if this file exists, we're likely in "debian:xxx-slim", and locales are thus being excluded so we need to remove that exclusion (since we need locales)
grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker; \
sed -ri '/\/usr\/share\/locale/d' /etc/dpkg/dpkg.cfg.d/docker; \
! grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker; \
fi; \
apt-get update; apt-get install -y --no-install-recommends locales; rm -rf /var/lib/apt/lists/*; \
echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen; \
locale-gen; \
locale -a | grep 'en_US.utf8'
ENV LANG en_US.utf8
RUN apt-get update && apt-get install --no-install-recommends -y \
ca-certificates iproute2 curl bash dnsutils vim socat procps ssh gnupg \
&& apt autoremove -y && apt clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Haproxy
RUN apt-get update && apt-get install --no-install-recommends -y \
haproxy=$HAPROXY_VERSION.\* \
&& apt autoremove -y && apt clean
ADD haproxy.cfg /fly/haproxy.cfg
RUN mkdir -p /run/haproxy/
EXPOSE 5432
CMD ["/usr/sbin/haproxy", "-W", "-db", "-f", "/fly/haproxy.cfg"]
I run the proxy in a new app in a shared-cpu-2x 2GB
machine (It would probably fit on a smaller machine) and voila, we can now point the RO URLs to this HAProxy and the read replicas are getting some traffics!