Consider pooling for embedding requests #2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Up to 200 concurrent requests are allowed, and any more will just 429. Realistically it's unlikely this will come up any time soon, so 429's are currently just handled with backed-off retries.
It would theoretically increase the maximum throughput (and possibly be a fun exercise) to make a simple pool to handle these calls. If ever the pool were fully utilized, individual input texts could be coalesced into batches. So, more texts in-flight but no more requests in-flight. (This could, of course, be done before the pool is fully utilized, but it would add user-facing latency, and the current pricing structure does not give any incentive to do so.)