Using Ollama as a ChatGPT Replacement

Ollama
ChatGPT
LLM
Writing
Llama
Mistral
Author

Vincent Grégoire, PhD, CFA

Published

March 8, 2024

Local Large Language Models (LLMs) like Ollama offer a powerful alternative to cloud-based solutions like ChatGPT. In this post, explore the benefits of using Ollama as a ChatGPT replacement for empirical finance research, focusing on privacy, customization, and computational flexibility. ChatGPT has become a daily driver for many researchers and professionals, including myself, offering a powerful tool for generating text, code, and insights across a wide range of domains. However, the reliance on closed, cloud-based solutions for leveraging Large Language Models (LLMs) like ChatGPT comes with inherent privacy, data security, and replicability concerns, especially in fields like empirical finance research.

Enter Ollama, a tool that allows researchers and professionals to manage and run open-source LLMs such as Meta’s Llamma 2 and Mistral AI’s Mistral, bypassing the need for cloud-based solutions. This post, the first in a series of three, aims to demystify the process of installing and using Ollama as a ChatGPT replacement. We’ll delve into the advantages of running LLMs on your own machines, offering you full control over your data while catering to the specific needs of finance-related research. Subsequent posts will explore Ollama’s prowess as a replacement for GitHub Copilot, enhancing coding efficiency, and its capabilities as a Python API for advanced text processing, opening new doors for empirical finance analysis and beyond.

Video tutorial

This post is also available as a video tutorial on YouTube.

Ollama

The open-source community was set ablaze with excitement following Meta’s release of LLaMA, the first “big” open-source Large Language Model (LLM). This pivotal moment marked a shift towards democratizing access to powerful AI tools, sparking a fervent race to discover the most efficient ways to harness such models on consumer hardware. Amidst this bustling innovation, one solution emerged as a beacon for those seeking to leverage the immense potential of LLMs within their own computing environments: Ollama.

What is Ollama?

Ollama is a cutting-edge software designed to simplify the process of downloading, managing, and running open-source LLMs directly on your computer. Recognizing the complexities and technical challenges involved in setting up and deploying LLMs, Ollama offers a streamlined solution that makes these powerful tools more accessible to a wider audience. Whether you’re a researcher, developer, or enthusiast, Ollama provides the necessary “back-end” infrastructure to run these models smoothly on your local machine.

The Power of Local Processing

At its core, Ollama harnesses the power of local processing, offering users complete control over their data and the AI models they interact with. This approach not only enhances privacy and security but also allows for greater flexibility and customization to meet specific needs or research goals. With Ollama, the complexities of running LLMs are abstracted away, leaving users free to focus on their work without worrying about the underlying technicalities.

Limitations of Local LLMs

While the benefits of local LLMs are clear, it’s important to acknowledge that running these models on consumer hardware comes with its own set of limitations. The computational resources required to run LLMs are substantial, and while Ollama provides a robust solution for managing these resources, users should be mindful of the hardware and memory constraints of their local machines. For best results, it’s recommended to run Ollama on a machine with a powerful GPU and ample memory to ensure smooth and efficient operation. Apple’s M-series chips, because they share memory between the CPU and GPU, are particularly well-suited for running Ollama and LLMs.

Front-End Applications for Enhanced Interaction

While Ollama serves as the robust engine running the models, most users seek a more intuitive way to interact with their LLMs beyond the command line. To address this, Ollama supports the integration of various “front-end” applications, each offering a unique interface and set of features tailored to different user preferences and use cases. In this post, we will explore two such front-end applications that serve as excellent ChatGPT replacements:

  1. Ollama-UI: A Chrome extension that enables users to access and interact with their LLMs directly from their web browser, offering convenience and flexibility for those who prefer web-based tools. Easy to install but limited in features.
  2. Open WebUI: A comprehensive ChatGPT replacement that offers a full-featured web interface, catering to users looking for a robust and feature-rich platform to leverage the capabilities of their local LLMs.

As we delve into the specifics of these front-end applications, it’s clear that Ollama is not just a tool but a gateway to unlocking the full potential of open-source LLMs. By bridging the gap between powerful AI models and everyday users, Ollama is setting the stage for a new era of innovation and accessibility in the world of artificial intelligence.

Other Use Cases

While this post focuses on using Ollama as a ChatGPT replacement, it’s important to note that Ollama’s capabilities extend far beyond text generation. In subsequent posts, we will explore two additional use cases for Ollama:

  • GitHub Copilot Replacement: Some models like CodeLlama and Mistral are designed to assist with code generation and programming tasks, making them ideal replacements for GitHub Copilot. Combined with Visual Studio Code extensions, Ollama offers a powerful alternative for developers seeking to enhance their coding efficiency and productivity.
  • OpenAI API Replacement: Ollama can serve as a Python API for advanced text processing, enabling users to leverage the capabilities of open-source LLMs for a wide range of natural language processing tasks. From sentiment analysis to language translation, Ollama opens new doors for researchers and professionals seeking to harness the power of AI in their work. For researchers in empirical finance, this means being able to use open-source LLMs to analyze and interpret financial documents, news articles, and other textual data with greater control, replicability, and privacy.

Setting Up Ollama

Ollama is a command-line tool that is available for macOS, Linux, and (experimental) Windows, making it accessible to a wide range of users. In this section, we’ll walk through the process of downloading and installing Ollama, setting up the necessary LLM models, and running Ollama to interact with these models. We’ll also explore the front-end applications that can be used to enhance the user experience when working with Ollama.

Download and Installation

The first step in setting up Ollama is to download and install the tool on your local machine. The installation process is straightforward and involves running a few commands in your terminal. Ollama’s download page provides installers for macOS and Windows, as well as instructions for Linux users. Once you’ve downloaded the installer, follow the installation instructions to set up Ollama on your machine.

If you’re using a Mac, you can install Ollama using Homebrew by running the following command in your terminal:

brew install ollama

The benefit of using Homebrew is that it simplifies the installation process and also sets up Ollama as a service, allowing it to run in the background and manage the LLM models you download.

Downloading and Running LLM Models

The list of available LLM models that can be run using Ollama is constantly expanding, with new models being added regularly. To see the current list, you can check the Ollama Library page. Once you’ve identified the models you’d like to use, you can download them using the ollama pull command followed by the model name. For example, to download the llama2 model, you would run:

ollama pull llama2

In order to run the models, you will need to start the Ollama service. If you installed using Homebrew or activated the service during installation, you have nothing to do. If not, you can start the service using the ollama serve command. This will initialize the Ollama service and allow you to interact with the models you’ve downloaded.

After downloading the model, you can run it using the ollama run command followed by the model name. For example, to start the llama2 model, you would run:

ollama run llama2

This will let you interact with the model directly from the command line, allowing you to generate text, code, or other outputs based on your input. However, for a more user-friendly experience, you will want to explore the front-end applications that Ollama supports instead of running it directly from the command line.

The most popular models available for use with Ollama are llama2, mistral, and mixtral. Each of these models offers unique capabilities and performance characteristics, catering to different use cases and hardware configurations, but according to the latest benchmark, mixtral is the most powerful and capable model, offering the best performance across a wide range of tasks.

Model Variants and Sizes

Models like Llama 2 and Mistral come in different sizes and variants, each offering a unique balance of computational power and specificity.

For example, the llama2 model is available in multiple variants along three dimensions:

  • Number of parameters: 7b, 13b, and 70b, each representing the number of parameters in the model in billions. The larger the model, the more powerful and capable it is, but it also requires more computational resources to run.
  • Quantization: q***, where *** represents the number of bits used to represent the model’s parameters and the quantization method used. LLM models weights are typically stored as 32-bit floating-point numbers, but quantization allows for the use of lower precision representations, such as 16-bit, 8-bit and as low as 2-bit. Lower quantization levels result in smaller model sizes and faster inference times, but may come at the cost of reduced model performance.
  • Tuning: text or chat, indicating whether the model has been fine-tuned for text generation or chat.

The mistral and mixtral models also come in different sizes and variants, each tailored to specific use cases and hardware configurations. By understanding the available model variants and sizes, you can choose the one that best suits your needs and computing environment, ensuring optimal performance and efficiency when running Ollama. This will require some experimentation to find the right balance between model size and performance for your specific use case and hardware configuration.

Other models like CodeLlama and Mistral are designed specifically for code generation and programming tasks, offering a powerful alternative to GitHub Copilot. These models are optimized for understanding and generating code, making them ideal for developers and researchers working on programming-related projects.

Models I Use

Here are the commands to install the models I use on my MacBook Pro M3 Max with 64GB of RAM:

ollama pull llama2
ollama pull mistral
ollama pull Mixtral
ollama pull llama2-uncensored
ollama pull CodeLlama
ollama pull deepseek-coder

Mixtral is the most powerful, but it requires a lot of memory and a powerful GPU to run. I use it for generating long-form content and for more complex tasks. Llama2 is a good all-rounder, and I use it for most of my text-generation tasks. Mistral is a smaller model that is well-suited for tasks that require less computational power, and CodeLlama is my go-to model for code generation and programming tasks.

These are also some of the most popular models available for use with Ollama, but I have yet to explore the other models available in the Ollama Library.

Setting Up Front-End Applications

I use two front-end applications to interact with Ollama: Ollama-UI and Open WebUI. These applications provide a more user-friendly interface for interacting with the LLM models, offering a range of features and capabilities that enhance the overall user experience. In this section, we’ll explore the installation and setup process for each of these front-end applications, highlighting their unique features and use cases.

Ollama-UI

Ollama-UI is a Chrome extension that allows users to access and interact with their LLM models directly from their web browser. This convenient front-end application offers a simple and intuitive interface for generating text, code, and other outputs based on user input. To install Ollama-UI, you need to get it from the Chrome Web Store. Once installed, you can access it by clicking on the Ollama icon in your browser’s toolbar.

For it to work, you need to have the Ollama service running on your local machine. The interface is simple and easy to use, allowing you to input text and receive outputs from the LLM models you’ve downloaded.

Open WebUI

Open WebUI is a full-featured web interface that offers a comprehensive platform for interacting with your local LLM models. This powerful front-end application provides a range of features and capabilities, including the ability to manage and run multiple models simultaneously, customize model settings, and access advanced options for generating text, code, and other outputs. Installing it is a bit more involved than Ollama-UI, but it offers a more robust and feature-rich platform for leveraging the capabilities of your local LLM models.

To install Open WebUI, you will need to first install Docker if you don’t already have it on your machine. Docker is a platform running applications using containerization, i.e. in self-contained environments. You can download and install Docker from the official website.

If you’re using a Mac, you can install Docker and Docker Desktop using Homebrew by running the following command in your terminal:

brew install docker
brew install --cask docker

Once you have Docker installed, you can run the following command in your terminal to start Open WebUI:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

This command will start the Open WebUI service and allow you to access the web interface by navigating to http://localhost:3000 in your web browser. From there, you can interact with your local LLM models, customize settings, and generate text, code, and other outputs based on your input.

Once you have Open WebUI up and running, you can explore its various features and capabilities. Note that the first time you start a new chat, Ollama will need to load the model, which can take a few seconds or longer depending on the model size and your hardware configuration. Once the model is loaded, you can start generating text and interacting with the model in real-time.

In the chat interface, you can select the model you want to use, input text, and receive outputs from the model. You can even select multiple models to run simultaneously, allowing you to compare the outputs and performance of different models. Aside from basic chat, here are the main features of Open WebUI that are available in the sidebar:

  • Modelfiles: These are WebUI’s equivalent of ChatGPT’s “GTPs”. They are pre-defined combinations of prompts and settings that can be used to generate specific types of outputs. For example, you might have a modelfile for generating code, another for summarizing text, and another for answering questions. You can start by looking at the modelfiles created by the community and available at https://openwebui.com.
  • Prompts: A place to save and manage prompt templates that you frequently use.
  • Documents: A place to save and manage documents that you want your models to be able to refer to. Note that the documents are not accessible as a whole to the model, instead they are available through a RAG (retrieval-augmented generation) mechanism. In practice, this means that the model can search for information in the documents and get the most relevant “chunks” of information to generate a response, but it doesn’t have access to the full documents. It is thus useful for accessing reference material, but not for summarizing or generating text based on the full content of the documents.

Open WebUI offers a range of advanced options and settings that allow you to customize the behavior of your local LLM models, making it a powerful platform for leveraging the capabilities of Ollama. Other features that go beyond the scope of this post include the ability to have multiple registered users, to generate images, and to access text-to-speech and speech-to-text capabilities.

Conclusion

The rise of open-source LLMs has ushered in a new era of innovation and accessibility in the world of artificial intelligence. There is so much potential for these models to transform the way we work and interact with AI, and Ollama is at the forefront of this revolution. By providing a streamlined solution for running LLMs on consumer hardware, Ollama is empowering researchers, developers, and enthusiasts to harness the full potential of these powerful models without relying on cloud-based solutions. It’s now up to us to explore the possibilities and push the boundaries of what’s possible with Ollama and open-source LLMs.