AI Chat Bot with Plugins (RAG VectorDB - ChromaDB, DuckDuckGo Search, Home Assistant, Vehicle Lookup)
Go to file
2023-10-15 21:49:16 -04:00
minyma Initial Commit 2023-10-15 21:49:16 -04:00
.dockerignore Initial Commit 2023-10-15 21:49:16 -04:00
.envrc Initial Commit 2023-10-15 21:49:16 -04:00
.flake8 Initial Commit 2023-10-15 21:49:16 -04:00
.gitignore Initial Commit 2023-10-15 21:49:16 -04:00
.pre-commit-config.yaml Initial Commit 2023-10-15 21:49:16 -04:00
Dockerfile Initial Commit 2023-10-15 21:49:16 -04:00
LICENSE Initial Commit 2023-10-15 21:49:16 -04:00
Makefile Initial Commit 2023-10-15 21:49:16 -04:00
MANIFEST.in Initial Commit 2023-10-15 21:49:16 -04:00
pyproject.toml Initial Commit 2023-10-15 21:49:16 -04:00
README.md Initial Commit 2023-10-15 21:49:16 -04:00
shell.nix Initial Commit 2023-10-15 21:49:16 -04:00

Usage

Running Server

# Locally
minyma server run

# Docker Quick Start
make docker_build_local
docker run \
    -p 5000:5000 \
    -e OPENAI_API_KEY=`cat openai_key` \
    -e DATA_PATH=/data \
    -v ./data:/data \
    minyma:latest

The server will now be accessible at http://localhost:5000

Normalizing & Loading Data

Minyma is designed to be extensible. You can add normalizers and vector db's using the appropriate interfaces defined in ./minyma/normalizer.py and ./minyma/vdb.py. At the moment the only supported database is chroma and the only supported normalizer is the pubmed normalizer.

To normalize data, you can use Minyma's normalize CLI command:

minyma normalize --filename ./pubmed_manuscripts.jsonl --normalizer pubmed --database chroma --datapath ./chroma

The above example does the following:

  • Uses the pubmed normalizer
  • Normalizes the ./pubmed_manuscripts.jsonl raw dataset [0]
  • Loads the output into a chroma database and persists the data to the ./chroma directory

NOTE: The above dataset took about an hour to normalize on my MPB M2 Max

[0] https://huggingface.co/datasets/TaylorAI/pubmed_author_manuscripts/tree/main

Development

# Initiate
python3 -m venv venv
. ./venv/bin/activate

# Local Development
pip install -e .

# Creds
export OPENAI_API_KEY=`cat openai_key`

Datasets

https://huggingface.co/datasets/TaylorAI/pubmed_author_manuscripts/tree/main

Notes

TODO: