This is a naive first switch from sync to async. This enables the backend to still answer to incomming requests while streaming LLM results to the user. For sure there is room for code cleaning and improvements, but this provides a nice improvement out of the box.
4.7 KiB
4.7 KiB