Consider pooling for embedding requests #2

Open
opened 2026-04-04 18:03:43 +00:00 by taf · 0 comments
Owner

Up to 200 concurrent requests are allowed, and any more will just 429. Realistically it's unlikely this will come up any time soon, so 429's are currently just handled with backed-off retries.

It would theoretically increase the maximum throughput (and possibly be a fun exercise) to make a simple pool to handle these calls. If ever the pool were fully utilized, individual input texts could be coalesced into batches. So, more texts in-flight but no more requests in-flight. (This could, of course, be done before the pool is fully utilized, but it would add user-facing latency, and the current pricing structure does not give any incentive to do so.)

Up to 200 concurrent requests are allowed, and any more will just 429. Realistically it's unlikely this will come up any time soon, so 429's are currently just handled with backed-off retries. It would theoretically increase the maximum throughput (and possibly be a fun exercise) to make a simple pool to handle these calls. If ever the pool were fully utilized, individual input texts could be coalesced into batches. So, more texts in-flight but no more requests in-flight. (This could, of course, be done before the pool is fully utilized, but it would add user-facing latency, and the current pricing structure does not give any incentive to do so.)
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
taf/podcast_search#2
No description provided.