How to Set Up a Private GPT: Step by Step

By:  Husam Yaghi

A local GPT model refers to having an AI model (Large Language Model) like GPT-3 installed and running directly on your own personal computer (Mac or Windows) or a local server.  As stated in a previous post, installing a local GPT offers numerous benefits, including enhanced privacy, independence from the internet, cost efficiency, and customization options.

Here we provide you with all the steps we took to have a functioning GPT installed locally to operate on local files in a secure private environment.

Install Python:

Download and install the latest version of Python from the official website

Make sure to select the option to add Python to your system’s PATH during the installation.

Install Git:

Download and install Git for Windows

Install Anaconda:

Because we will be running a lot of Python scripts, Anaconda is a great replacement to the default Windows Command Prompt.

Prepare for the model:

Do a quick internet search to help you decide on the GPT model you want to use based on the available hardware resources you have.  Also, you may want to visit to check out the long list of available models. Here is a sample:

  • GPT-2 (this is a small and pretrained model by OpenAI)
  • GPT-Neo (this is a larger pretrained model than GPT-2 by Anthropic)
  • Llama (this is a much larger pretrained model by Meta AI)

For this post, we will use GPT-Neo as it is not too small nor too large; and it could run comfortably with just one GPU (there are models which could run on PC’s without a GPU though keep in mind that the more compute resources you have, the better the performance).


Let’s start the installation process of our chosen model (GPT-Neo):

  • Click on the Windows icon and run Anaconda as an Administrator
  • A command prompt screen will be opened
  • Create the directory where the local GPT will be installed:

                      mkdir c:\yaghiGPT

            cd c:\yaghiGPT

Install the Prerequisites:

        c:\yaghiGPT > pip install transformers torch sentencepiece

          c:\yaghiGPT > pip install fastapi

          c:\yaghiGPT > pip install uvicorn

          c:\yaghiGPT > pip install torch

          c:\yaghiGPT > pip install tensorflow

          c:\yaghiGPT > pip install tensorboard

          c:\yaghiGPT > pip uninstall huggingface_hub

          c:\yaghiGPT > pip install transformers datasets

Clone the GPT-Neo repository:

          c:\yaghiGPT > git clone

Create a virtual environment and activate it:

          c:\yaghiGPT > python  -m  venv  gpt-neo-env

            c:\yaghiGPT > Scripts\activate  gpt-neo-env/bin/activate

Create Dataset:

Gather all files which you plan to train the model on and use later.  Put all those files in a folder; let’s call it: gptDataSet

          c:\yaghiGPT >  mkdir  gptDataSet

The dataset consists of PDF, Word, and PowerPoint files.  

Process the dataset:

The model will extract the text content from those files before using them for fine-tuning.  Let’s create the following script in a file called This script automatically will process any file which gets added to the dataset folder:


import os

import time

import PyPDF2

import docx

from pptx import Presentation

from watchdog.observers import Observer

from import FileSystemEventHandler

# Directory containing the files

directory = "d:/yaghiDataSet"

# Function to process PDF files

def process_pdf(file_path):

    with open(file_path, 'rb') as file:

        reader = PyPDF2.PdfReader(file)

        num_pages = len(reader.pages)

        text = []

        for page_num in range(num_pages):

            page = reader.pages[page_num]


        return "\n".join(text)

# Function to process Word files

def process_word(file_path):

    doc = docx.Document(file_path)

    text = []

    for paragraph in doc.paragraphs:


    return "\n".join(text)

# Function to process PowerPoint files

def process_pptx(file_path):

    prs = Presentation(file_path)

    text = []

    for slide in prs.slides:

        for shape in slide.shapes:

            if hasattr(shape, "text"):


    return "\n".join(text)

# Function to process a single file

def process_file(file_path):

    filename = os.path.basename(file_path)

    if filename.endswith(".pdf"):

        print(f"Processing PDF: {filename}")

        pdf_text = process_pdf(file_path)

        print(f"Extracted text length: {len(pdf_text)} characters")

    elif filename.endswith(".docx"):

        print(f"Processing Word Document: {filename}")

        word_text = process_word(file_path)

        print(f"Extracted text length: {len(word_text)} characters")

    elif filename.endswith(".pptx"):

        print(f"Processing PowerPoint Presentation: {filename}")

        pptx_text = process_pptx(file_path)

        print(f"Extracted text length: {len(pptx_text)} characters")


        print(f"Unsupported file type: {filename}")

# Custom event handler

class FileHandler(FileSystemEventHandler):

    def on_created(self, event):

        if event.is_directory:


        print(f"New file detected: {event.src_path}")


# Process existing files

def process_existing_files():

    print(f"Checking existing files in {directory}")

    for filename in os.listdir(directory):

        file_path = os.path.join(directory, filename)

        if os.path.isfile(file_path):


# Set up the observer

event_handler = FileHandler()

observer = Observer()

observer.schedule(event_handler, directory, recursive=False)

print("Starting file processing script...")

print(f"Monitoring directory: {directory}")




    print("Waiting for new files...")

    while True:


except KeyboardInterrupt:

    print("Script interrupted by user.")




run the script; it might take hours or minutes depending on the number of files your dataset contains:

          c:\yaghiGPT >  python

Train the model:

Now we need to train the model on the local dataset, finetune it to improve its quality, then continuously monitor the performance.  Create to contain the following script:


import os

import numpy as np

import pickle

from PyPDF2 import PdfReader

from sentence_transformers import SentenceTransformer

from transformers import pipeline, GPT2LMHeadModel, GPT2Tokenizer

import tkinter as tk

from tkinter import ttk, filedialog, scrolledtext

import threading

import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

class QASystem:

    def __init__(self, directory, index_file='index.pkl', embeddings_file='embeddings.pkl'):

        # ... (existing QASystem code) ...

    def finetune_model(self, train_dataset, eval_dataset, output_dir, num_epochs=3, batch_size=4, learning_rate=5e-5):

        model = GPT2LMHeadModel.from_pretrained("gpt-neo-2.7B")

        tokenizer = GPT2Tokenizer.from_pretrained("gpt-neo-2.7B")


        optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

        for epoch in range(num_epochs):

            train_loss = 0

            for batch in train_dataset:


                input_ids = batch['input_ids'].to(device)

                attention_mask = batch['attention_mask'].to(device)

                labels = batch['input_ids'].to(device)

                outputs = model(input_ids, attention_mask=attention_mask, labels=labels)

                loss = outputs.loss



                train_loss += loss.item()

            eval_loss = 0


            for batch in eval_dataset:

                input_ids = batch['input_ids'].to(device)

                attention_mask = batch['attention_mask'].to(device)

                labels = batch['input_ids'].to(device)

                with torch.no_grad():

                    outputs = model(input_ids, attention_mask=attention_mask, labels=labels)

                    eval_loss += outputs.loss.item()

            eval_loss /= len(eval_dataset)

  "Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss/len(train_dataset)}, Eval Loss: {eval_loss}")

        os.makedirs(output_dir, exist_ok=True)



class QAApp:

    def __init__(self, root, qa_system):

        # ... (existing QAApp code) ...

    def monitor_performance(self):

        # Add code to monitor the performance of the QA system

        # This could include metrics like accuracy, F1 score, etc.

        # and display them in the GUI or log them to a file


if __:

    root = tk.Tk()

    qa_system = QASystem("data")

    app = QAApp(root, qa_system)

    # Finetune the model

    train_dataset, eval_dataset = load_datasets()

    qa_system.finetune_model(train_dataset, eval_dataset, "output")

    # Monitor the performance



execute the script:

          c:\yaghiGPT >  python

Load & Run:

Now we’re ready to start interacting with the model.   Paste the following script into a file called

# Add a follow_up method in the QAApp class to handle follow-up questions.

# Store the context of the previous answer so that it can be used to generate more detailed responses for follow-up questions.

# Add a "Follow-Up" button in the GUI and connect it to the follow_up method.

import os

import numpy as np

import torch

import re

import pickle

from PyPDF2 import PdfReader

from docx import Document

from pptx import Presentation

from sentence_transformers import SentenceTransformer

from transformers import AutoTokenizer, pipeline

from faiss import IndexFlatL2

import tkinter as tk

from tkinter import ttk, filedialog, scrolledtext

import speech_recognition as sr

import pyttsx3

class QASystem:

    def __init__(self, directory, index_file='index.pkl', embeddings_file='embeddings.pkl'): = directory

        self.index_file = index_file

        self.embeddings_file = embeddings_file

        if os.path.exists(self.index_file) and os.path.exists(self.embeddings_file):

            self.index, self.embedding_model, self.document_embeddings = self.load_index_and_embeddings()


            self.documents = self.load_documents()

            self.index, self.embedding_model, self.document_embeddings = self.create_index()


        self.qa_pipeline = pipeline("question-answering", model="EleutherAI/gpt-neo-2.7B", tokenizer="EleutherAI/gpt-neo-2.7B")

        self.summarizer_pipeline = pipeline("summarization", model="EleutherAI/gpt-neo-2.7B", tokenizer="EleutherAI/gpt-neo-2.7B")

    def save_index_and_embeddings(self):

        with open(self.index_file, 'wb') as f:

            pickle.dump((self.index, self.embedding_model, self.document_embeddings), f)

        with open(self.embeddings_file, 'wb') as f:

            pickle.dump(self.document_embeddings, f)

    def load_index_and_embeddings(self):


            with open(self.index_file, 'rb') as f:

                index, embedding_model, document_embeddings = pickle.load(f)

            with open(self.embeddings_file, 'rb') as f:

                document_embeddings = pickle.load(f)

            return index, embedding_model, document_embeddings

        except AttributeError as e:

            print(f"Error loading index and embeddings: {e}")

            print("Recreating index and embeddings...")

            self.documents = self.load_documents()

            self.index, self.embedding_model, self.document_embeddings = self.create_index()


            return self.index, self.embedding_model, self.document_embeddings

    def load_documents(self):

        documents = []

        files = os.listdir(

        for filename in files:

            if filename.endswith('.pdf'):

                filepath = os.path.join(, filename)


                    with open(filepath, 'rb') as file:

                        pdf = PdfReader(file)

                        text = ''.join(page.extract_text() for page in pdf.pages)

                        documents.append((filename, text))

                except Exception as e:

                    print(f"Error reading {filename}: {str(e)}")

            elif filename.endswith('.docx'):

                filepath = os.path.join(, filename)


                    doc = Document(filepath)

                    text = ' '.join(para.text for para in doc.paragraphs)

                    documents.append((filename, text))

                except Exception as e:

                    print(f"Error reading {filename}: {str(e)}")

            elif filename.endswith('.pptx'):

                filepath = os.path.join(, filename)


                    ppt = Presentation(filepath)

                    text = ' '.join(shape.text for slide in ppt.slides for shape in slide.shapes if hasattr(shape, 'text'))

                    documents.append((filename, text))

                except Exception as e:

                    print(f"Error reading {filename}: {str(e)}")

        return documents

    def create_index(self):

        embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

        document_embeddings = []

        for doc_id, text in self.documents:

            chunks = self.chunk_text(text)

            for chunk in chunks:

                embedding = embedding_model.encode(chunk)

                document_embeddings.append((embedding, doc_id, chunk))

        embeddings = np.array(for embed in document_embeddings])

        index = IndexFlatL2(embeddings.shape[1])


        return index, embedding_model, document_embeddings

    def chunk_text(self, text, max_length=512):

        words = text.split()

        return [" ".join(words[i:i + max_length]) for i in range(0, len(words), max_length)]

    def extract_relevant_text(self, context, query, max_length=1000):

        words = context.split()

        if (len(words) <= max_length):

            return context

        query_words = set(query.lower().split())

        scores = []

        for i in range(len(words) - max_length + 1):

            chunk = ' '.join(words[i:i + max_length])

            score = sum(1 for word in query_words if word in chunk.lower())

            scores.append((score, i))

        best_start = max(scores, key=lambda x: x[0])[1]

        return ' '.join(words[best_start:best_start + max_length])

    def answer_question(self, question, contexts, max_length=2048, max_new_tokens=50):

        combined_context = " ".join(contexts)

        # Truncate the combined context to the maximum allowed length

        combined_context = combined_context[:max_length]

        result = self.qa_pipeline(question=question, context=combined_context, max_new_tokens=max_new_tokens)

        answer = result['answer']

        return answer

    def summarize_text(self, text, max_length=2048, max_new_tokens=50):

        text = text[:max_length]  # Ensure the text does not exceed the maximum length

        summary = self.summarizer_pipeline(text, max_new_tokens=max_new_tokens)

        return summary[0]['summary_text']

    def post_process_answer(self, answer):

        answer = answer.strip('.,;: \t\n').capitalize()

        if not answer.endswith(('.', '!', '?')):

            answer += '.'

        return answer

    def clean_answer(self, answer):

        # Add your cleaning logic here

        # For now, let's just return the answer as is

        return answer

    def get_answer(self, question, num_sources=3):

        question_embedding = self.embedding_model.encode(question)

        distances, indices =[question_embedding]), num_sources)

        best_chunks = [self.document_embeddings[idx][2] for idx in indices[0]]

        relevant_docs = [self.document_embeddings[idx][1] for idx in indices[0]]

        relevant_texts = [self.extract_relevant_text(chunk, question) for chunk in best_chunks]

        answer = self.answer_question(question, relevant_texts)

        answer = self.clean_answer(answer)

        answer = self.post_process_answer(answer)

        if len(answer.split()) < 20:

            combined_text = " ".join(relevant_texts)

            summary = self.summarize_text(combined_text)

            if len(summary.split()) > len(answer.split()):

                answer = summary

        return answer, relevant_docs, relevant_texts  # Return relevant_texts for follow-up

class QAApp:

    def __init__(self, root, qa_system):

        self.qa_system = qa_system

        self.history = []  # Initialize history

        self.previous_context = []  # Store the context of the previous answer

        self.root = root

        self.root.title("Yaghi's QA System")

        self.root.geometry("800x600") = ttk.Style()'clam')

        self.frame = ttk.Frame(root, padding="10 10 10 10")

        self.frame.grid(row=0, column=0, sticky=(tk.W, tk.E, tk.N, tk.S))

        self.input_label = ttk.Label(self.frame, text="Welcome to Yaghi's GPT. Please enter your question:", background="", foreground="#00695c", font=("Helvetica", 12, "bold"))

        self.input_label.grid(row=0, column=0, sticky=tk.W, pady=5)

        self.input_text = ttk.Entry(self.frame, width=70, font=("Helvetica", 10))

        self.input_text.grid(row=1, column=0, pady=5, padx=5)

        self.submit_button = ttk.Button(self.frame, text="Submit", command=self.get_response, style="TButton")

        self.submit_button.grid(row=1, column=1, padx=5)

        self.output_label = ttk.Label(self.frame, text="Response:", background="", foreground="#00695c", font=("Helvetica", 12, "bold"))

        self.output_label.grid(row=2, column=0, sticky=tk.W, pady=5)

        self.output_text = scrolledtext.ScrolledText(self.frame, wrap=tk.WORD, width=70, height=20, font=("Helvetica", 10), background="#ffffff")

        self.output_text.grid(row=3, column=0, pady=5, columnspan=2)

        # Follow-Up button

        self.follow_up_button = ttk.Button(self.frame, text="Follow-Up", command=self.follow_up, style="TButton")

        self.follow_up_button.grid(row=2, column=1, pady=5)

        for child in self.frame.winfo_children():

            child.grid_configure(padx=5, pady=5)

        self.root.columnconfigure(0, weight=1)

        self.root.rowconfigure(0, weight=1)

        # Add styles"TButton", foreground="#ffffff", background="#00796b", font=("Helvetica", 10, "bold"))"TButton", background=[("active", "#004d40")])

        # Export button

        self.export_button = ttk.Button(self.frame, text="Export History", command=self.export_history, style="TButton")

        self.export_button.grid(row=4, column=0, pady=5)

        # Voice input button

        self.voice_button = ttk.Button(self.frame, text="Voice Input", command=self.voice_input, style="TButton")

        self.voice_button.grid(row=4, column=1, pady=5)

    def get_response(self):

        question = self.input_text.get()

        answer, sources, contexts = self.qa_system.get_answer(question)  # Get contexts for follow-up

# Store the contexts for follow-up questions

        self.previous_context = contexts

        # Clear the current text

        self.output_text.delete(1.0, tk.END)

        # Insert the question in a different color

        self.output_text.insert(tk.END, "Question: ", "question")

        self.output_text.insert(tk.END, question + "\n", "question_text")

       # Insert the answer in a different color

        self.output_text.insert(tk.END, "Answer: ", "answer")

        self.output_text.insert(tk.END, answer + "\n", "answer_text")

       # Insert the sources

        self.output_text.insert(tk.END, "Sources: " + ", ".join(sources) + "\n\n")

        # Define tags for custom colors

        self.output_text.tag_config("question", foreground="#1e88e5", font=("Helvetica", 10, "bold"))

        self.output_text.tag_config("question_text", foreground="#0d47a1")

        self.output_text.tag_config("answer", foreground="#43a047", font=("Helvetica", 10, "bold"))

        self.output_text.tag_config("answer_text", foreground="#1b5e20")

        # Store the interaction in history


            "question": question,

            "answer": answer,

            "sources": sources


    def follow_up(self):

        follow_up_question = self.input_text.get()

        if not self.previous_context:

            self.output_text.insert(tk.END, "No previous context available for follow-up.\n", "error")


        # Debug: Log follow-up question and previous context

        print(f"Follow-up question: {follow_up_question}")

        print(f"Previous context: {self.previous_context}")

        # Use the previous context to generate more relevant context for the follow-up question

        combined_context = " ".join(self.previous_context)

        relevant_text = self.qa_system.extract_relevant_text(combined_context, follow_up_question)

        print(f"Relevant text for follow-up: {relevant_text}")

        # Get the answer using the relevant context

        answer = self.qa_system.answer_question(follow_up_question, [relevant_text])

        print(f"Answer for follow-up: {answer}")

        # Clear the current text

        self.output_text.delete(1.0, tk.END)

        # Insert the follow-up question in a different color

        self.output_text.insert(tk.END, "Follow-Up Question: ", "question")

        self.output_text.insert(tk.END, follow_up_question + "\n", "question_text")

       # Insert the answer in a different color

        self.output_text.insert(tk.END, "Answer: ", "answer")

        self.output_text.insert(tk.END, answer + "\n", "answer_text")

       # Insert the sources

        self.output_text.insert(tk.END, "Sources: " + ", ".join(self.qa_system.get_answer(follow_up_question)[1]) + "\n\n")

        # Define tags for custom colors

        self.output_text.tag_config("question", foreground="#1e88e5", font=("Helvetica", 10, "bold"))

        self.output_text.tag_config("question_text", foreground="#0d47a1")

        self.output_text.tag_config("answer", foreground="#43a047", font=("Helvetica", 10, "bold"))

        self.output_text.tag_config("answer_text", foreground="#1b5e20")

        # Store the interaction in history


            "question": follow_up_question,

            "answer": answer,

            "sources": self.qa_system.get_answer(follow_up_question)[1]


    def export_history(self):

        file_path = filedialog.asksaveasfilename(defaultextension=".txt", filetypes=[("Text files", "*.txt")])

        if file_path:

            with open(file_path, 'w') as file:

                for entry in self.history:

                    file.write(f"Question: {entry['question']}\n")

                    file.write(f"Answer: {entry['answer']}\n")

                    file.write(f"Sources: {', '.join(entry['sources'])}\n\n")

    def voice_input(self):

        recognizer = sr.Recognizer()

        with sr.Microphone() as source:

            self.output_text.insert(tk.END, "Listening...\n")

            audio = recognizer.listen(source)


                question = recognizer.recognize_google(audio)

                self.input_text.delete(0, tk.END)

                self.input_text.insert(0, question)


            except sr.UnknownValueError:

                self.output_text.insert(tk.END, "Could not understand audio\n")

            except sr.RequestError as e:

                self.output_text.insert(tk.END, f"Could not request results; {e}\n")

if __name__ == "__main__":

    root = tk.Tk()

    qa_system = QASystem(directory="d:/gptDataSet")

    app = QAApp(root, qa_system)


Execute the code:

            c:\yaghiGPT >  python



Hopefully now the model runs without any missing libraries.  It should create a window for interactivity with the model.  Ask the model questions for answers from the dataset files.  Keep retouching the parameters to improve the training and retrieval processes.

Also, I made good use of my subscription. It gave me access to Claude-3.5-Sonnet and GPT-4o (of course in addition to many other models). I used these two amazing models to get help with error messages and to double check my scripts. I can’t say which one was better, because their responses or performance was inconsistent, so I flipped between them. Further, to avoid getting undesired replies, when writing a prompt, I would start with a background phrase: “I am installing a local private GPT on my Windows PC. Please …….”. These models, could rewrite/fix a script, could explain what a script is doing, and could guide you as well.


