You can scale your AI application built on Postgres with pgvector in the same way you would any Postgres app: Vertically with added CPU, RAM, and storage, or horizontally with read replicas.

In Neon, scaling vertically is a matter of selecting the desired compute size. Neon supports compute sizes ranging from .25 CU (1 GB RAM) up to 56 CU (224 GB RAM). Autoscaling is supported up to 16 CU. Larger computes are fixed size computes (no autoscaling). The maintenance_work_mem values shown below are approximate.

Compute Units (CU)RAMmaintenance_work_mem
0.251 GB64 MB
0.502 GB64 MB
14 GB67 MB
28 GB134 MB
312 GB201 MB
416 GB268 MB
520 GB335 MB
624 GB402 MB
728 GB470 MB
832 GB537 MB
936 GB604 MB
1040 GB671 MB
1144 GB738 MB
1248 GB805 MB
1352 GB872 MB
1456 GB939 MB
1560 GB1007 MB
1664 GB1074 MB
1872 GB1208 MB
2080 GB1342 MB
2288 GB1476 MB
2496 GB1610 MB
26104 GB1744 MB
28112 GB1878 MB
30120 GB2012 MB
32128 GB2146 MB
34136 GB2280 MB
36144 GB2414 MB
38152 GB2548 MB
40160 GB2682 MB
42168 GB2816 MB
44176 GB2950 MB
46184 GB3084 MB
48192 GB3218 MB
50200 GB3352 MB
52208 GB3486 MB
54216 GB3620 MB
56224 GB3754 MB

See Edit a compute to configure your compute size. Available compute sizes differ according to your Neon plan.

To optimize pgvector index build time, you can increase the maintenance_work_mem setting for the current session beyond the preconfigured default shown in the table above with a command similar to this:

SET maintenance_work_mem='10 GB';

The recommended maintenance_work_mem setting is your working set size (the size of your tuples for vector index creation). However, your maintenance_work_mem setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, the maintenance_work_mem='10 GB' setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GB is less than 50% of the RAM available for that compute size.

Autoscaling

You can also enable Neon's autoscaling feature for automatic scaling of compute resources. Neon's Autoscaling feature automatically scales up compute on demand in response to application workload and down to zero on inactivity.

For example, if your AI application experiences heavy load during certain hours of the day or at different times throughout the week, month, or calendar year, Neon automatically scales compute resources without manual intervention according to the compute size boundaries that you configure. This enables you to handle peak demand while avoiding consuming compute resources during periods of low activity.

Enabling autoscaling is also recommended for initial data loads and memory-intensive index builds to ensure sufficient compute resources for this phase of your AI application setup.

To learn more about Neon's autoscaling feature and how to enable it, refer to our Autoscaling guide.

Storage

On the Free plan, you get 0.5 GB of storage plus 0.5 GB of storage per branch. Storage on paid plans is usage based. See Neon plans for details.

Read replicas

Neon supports read replicas, which are independent read-only computes designed to perform read operations on the same data as your primary read-write compute. Read replicas do not replicate data across database instances. Instead, read requests are directed to the same data source. This architecture enables read replicas to be created instantly, enabling you to scale out CPU and RAM, but because data is read from a single source, there are no additional storage costs.

Since vector similarity search is a read-only workload, you can leverage read replicas to offload reads from your primary read-write compute to a dedicated compute when deploying AI applications. After you create a read replica, you can simply swap out your current Neon connecting string for the read replica connection string, which makes deploying a read replica for your AI application very simple.

Neon's read replicas support the same compute sizes outlined above. Read replicas also support autoscaling.

To learn more about the Neon read replicas, see read replicas and refer to our Working with Neon read replicas guide.