Use Ollama as a server

Ollama Server

Posted by Xavier Bouclet on March 22, 2024 · 8 mins read

Use Ollama as a server

1. Purpose of this blog post

If you saw my last blog posts about Install Ollama and how to install Specific Models on Ollama you should be able to address a lot of use cases. But in this post, we will see how to work with the Ollama server thanks to Spring AI.

2. Ollama Server with Spring AI

If Ollama is running you should check if the server is running.

curl http://localhost:11434
Ollama is running%

To figure all the endpoints available, you can check the following links :

The second link seems not official but could be quite useful.

Let’s try to do some stuff with Spring AI. Ollama is supported by Spring AI through :

dependencies {
    implementation 'org.springframework.ai:spring-ai-ollama'
}

Or with the Spring Boot starter :

dependencies {
    implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
}

In the following example, we will use the spring-ai-ollama dependency.

3. Create the project

To create the project, you can use my last blog posts on Spring AI :

Or clone the code from the following repository :

git clone git@github.com:mikrethor/spring-ai.git

Let’s add the dependency to the build.gradle.kts file :

...
dependencies {
	implementation("org.springframework.boot:spring-boot-starter-web")
	implementation("com.fasterxml.jackson.module:jackson-module-kotlin")
	implementation("org.jetbrains.kotlin:kotlin-reflect")
	implementation("org.springframework.ai:spring-ai-openai-spring-boot-starter")
	implementation("org.springframework.ai:spring-ai-ollama)
	testImplementation("org.springframework.boot:spring-boot-starter-test")
}
...

Let’s modify our RouterConfiguration to add the Ollama endpoint to our API:

package com.xavierbouclet.springai

...

@Configuration(proxyBeanMethods = false)
class RouterConfiguration {

   ...

    @Bean
    fun aiRouter(chatClient: OpenAiChatClient,
                 imageClient: OpenAiImageClient,
                 ollamaChatClient: OllamaChatClient) = router {
        GET("/api/ollama/generate") { request ->
            ServerResponse
                .ok()
                .body(
                    ollamaChatClient.call(
                        request
                            .param("message")
                            .orElse("Tell me a Chuck Norris fact")
                    )
                )
        }
        GET("/api/ollama/generateStream") { request ->
            ServerResponse
                .ok()
                .body(ollamaChatClient.stream(
                    Prompt(
                        UserMessage(
                            request
                                .param("message")
                                .orElse("Tell me a Chuck Norris fact")
                        )
                    )
                ).mapNotNull { chatResp -> chatResp?.result?.output?.content }
                    .toStream()
                    .toList()
                )
        }
        ...
}

So to interact with the Ollama server, we will use the OllamaChatClient and use the following endpoints :

  • /api/ollama/generate

  • /api/ollama/generateStream

To configure the OllamaChatClient we will use the following properties :

spring:
  ai:
      ...
      ollama:
        base-url: http://localhost:11434
        chat:
          model: mistral
          options:
            temperature: 0.7

As you can see we will use the mistral model with a temperature of 0.7. It means that a mistral model conservative with a creativity touch.

Let’s run the application. And call the endpoint /api/ollama/generate :

curl http://localhost:8080/api/ollama/generate

 Chuck Norris does not sleep. He stays up all nights, preventing us from having bad dreams and ensuring that sunrise comes every morning. This is just one of the many humorous legends surrounding the martial artist and actor. In reality, Chuck Norris is a highly skilled martial artist who has won numerous championships and starred in many action movies. He holds a 9th-degree black belt in South Korean Tang Soo Do Moo Sool Kwan Haeng Il as well as a 2nd-degree black belt in Brazilian Jiu-Jitsu. He also served in the United States Air Force as an air policeman.%

We can also call the endpoint /api/ollama/generateStream :

curl http://localhost:8080/api/ollama/generateStream
[" Chuck"," Nor","ris"," does"," not"," sleep","."," He"," stays"," up"," all"," nights",","," preventing"," evil"," do","ers"," from"," causing"," chaos"," around"," the"," world","."," His"," lack"," of"," sleep"," is"," what"," gives"," him"," super","human"," strength"," and"," abilities",".","\n","\n","Or"," how"," about"," this"," one",":"," Chuck"," Nor","ris"," can"," divide"," by"," zero","."," He"," doesn","'","t"," need"," to"," follow"," the"," rules"," that"," the"," rest"," of"," us"," mort","als"," are"," bound"," by",".",""]%

By using a reactive type we could generate a Chat GTP response like. It could be a future improvement.

Et voilà, you can now use the Ollama server with Spring AI. Let’s see some Llava use case in my next blog post.

4. Conclusion

Ollama and its server are a nice way to try some Spring AI code and try some use cases. It could also be a non prod solution to test some models without any cost.

Resources

Follow Me