Compare commits

...

10 Commits

Author SHA1 Message Date
ebfea97af7 [add] youtube plugin, [improve] initial prompt (JSON)
All checks were successful
continuous-integration/drone/push Build is passing
2023-11-10 09:19:24 -05:00
ca8c306534 [add] better error handling
All checks were successful
continuous-integration/drone/push Build is passing
2023-11-08 20:52:29 -05:00
3168bfffd1 Merge pull request 'Add Plugins' (#1) from function_plugins into master
All checks were successful
continuous-integration/drone/push Build is passing
Reviewed-on: #1
2023-11-09 00:31:51 +00:00
7f0d74458d [add] migrate chromadb to plugin
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
2023-11-08 18:35:56 -05:00
cf8e023b82 [add] home assistant plugin
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2023-11-07 20:34:18 -05:00
b82e086cbb [add] basic plugin support
All checks were successful
continuous-integration/drone/push Build is passing
2023-11-05 21:01:43 -05:00
5afd2bb498 [add] Drone CI/CD
All checks were successful
continuous-integration/drone/push Build is passing
2023-10-24 08:24:22 -04:00
5efffd5e96 [fix] overflow issue 2023-10-19 19:29:13 -04:00
40daf46c03 [fix] lower on query, [add] metadata response, [add] context distance & reference links 2023-10-19 18:56:48 -04:00
05c5546c10 [add] pub docker image, [add] improve mobile css 2023-10-15 23:13:33 -04:00
19 changed files with 818 additions and 200 deletions

17
.drone.yml Normal file
View File

@ -0,0 +1,17 @@
kind: pipeline
type: kubernetes
name: default
steps:
# Publish Dev Docker Image
- name: publish_docker
image: plugins/docker
settings:
repo: gitea.va.reichard.io/evan/minyma
registry: gitea.va.reichard.io
tags:
- dev
username:
from_secret: docker_username
password:
from_secret: docker_password

3
.gitignore vendored
View File

@ -2,6 +2,9 @@ __pycache__
.DS_Store
.direnv
data
datasets
venv
openai_key
ha_key
minyma.egg-info/
NOTES.md

126
README.md
View File

@ -13,23 +13,98 @@
---
AI Chat Bot with Vector / Embedding DB Context
AI Chat Bot with Plugins (RAG VectorDB - ChromaDB, DuckDuckGo Search, Home Assistant, Vehicle Lookup, YouTube)
[![Build Status](https://drone.va.reichard.io/api/badges/evan/minyma/status.svg)](https://drone.va.reichard.io/evan/minyma)
## Plugins
### ChromeDB Embeddings / Vectors
This utilizes a local embeddings DB. This allows you to ask the assistant
about local information. [Utilizes Retrieval-Augmented Generation (RAG)](https://arxiv.org/abs/2005.11401).
```
User: What are some common symptoms of COVID-19?
Assistant: Some common symptoms of COVID-19 mentioned in the context are
fatigue, headache, dyspnea (shortness of breath), anosmia (loss of
sense of smell), lower respiratory symptoms, cardiac symptoms,
concentration or memory issues, tinnitus and earache, and peripheral
neuropathy symptoms.
```
**NOTE:** Instructions on how to load this with your own information are in the
"Normalizing & Loading Data" section. We include a PubMed data normalizer as an
example.
### YouTube
This utilizes `yt-dlp` to download a videos subtitles. Ask questions about YouTube videos!
```
User: Tell me about this youtube video: https://www.youtube.com/watch?v=ZWgr7qP6yhY
Assistant: The YouTube video you provided is a review of the new MacBook Pro by
Apple. The host discusses the laptop's features, including its new
color and chip. They mention that the laptop still retains its ports,
HDMI, and high-quality display, but also notes some shortcomings like
the notch and lack of face unlock. The host shares their impressions
of the new black color [...]
```
### DuckDuckGo
This utilizes DuckDuckGo Search by scraping the top 5 results.
```
User: Tell me about Evan Reichard
Assistant: Evan Reichard is a Principal Detection and Response Engineer based
in the Washington DC-Baltimore Area. He has been in this role since
August 2022. Evan has created a browser extension that helps SOC
analysts and saves them over 300 hours per month. Additionally,
there are three professionals named Evan Reichard on LinkedIn and
there are also profiles of people named Evan Reichard on Facebook.
```
### Vehicle Lookup API
This utilizes Carvana's undocumented API to lookup details on a vehicle.
```
User: What vehicle is NY plate HELLO?
Assistant: The vehicle corresponding to NY plate HELLO is a 2016 MAZDA CX-5
Grand Touring Sport Utility 4D with VIN JM3KE4DY6G0672552.
```
### Home Assistant API
This utilizes Home Assistants [Conversational API](https://developers.home-assistant.io/docs/intent_conversation_api/).
```
User: Turn off the living room lights
Assistant: The living room lights have been turned off. Is there anything else I can assist you with?
User: Turn on the living room lights
Assistant: The living room lights have been turned on successfully.
```
## Running Server
```bash
# Locally (See "Development" Section)
export OPENAI_API_KEY=`cat openai_key`
export CHROMA_DATA_PATH=/data
export HOME_ASSISTANT_API_KEY=`cat ha_key`
export HOME_ASSISTANT_URL=https://some-url.com
minyma server run
# Docker Quick Start
make docker_build_local
docker run \
-p 5000:5000 \
-e OPENAI_API_KEY=`cat openai_key` \
-e DATA_PATH=/data \
-e CHROMA_DATA_PATH=/data \
-v ./data:/data \
minyma:latest
gitea.va.reichard.io/evan/minyma:latest
```
The server will now be accessible at `http://localhost:5000`
@ -44,7 +119,11 @@ and the only supported normalizer is the `pubmed` normalizer.
To normalize data, you can use Minyma's `normalize` CLI command:
```bash
minyma normalize --filename ./pubmed_manuscripts.jsonl --normalizer pubmed --database chroma --datapath ./chroma
minyma normalize \
--normalizer pubmed \
--database chroma \
--datapath ./data \
--filename ./datasets/pubmed_manuscripts.jsonl
```
The above example does the following:
@ -59,10 +138,12 @@ The above example does the following:
## Configuration
| Environment Variable | Default Value | Description |
| -------------------- | ------------- | ---------------------------------------------------------------------------------- |
| OPENAI_API_KEY | NONE | Required OpenAI API Key for ChatGPT access. |
| DATA_PATH | ./data | The path to the data directory. Chroma will store its data in the `chroma` subdir. |
| Environment Variable | Default Value | Description |
| ---------------------- | ------------- | ----------------------------------- |
| OPENAI_API_KEY | NONE | Required OpenAI API Key for ChatGPT |
| CHROMA_DATA_PATH | NONE | ChromaDB Persistent Data Director |
| HOME_ASSISTANT_API_KEY | NONE | Home Assistant API Key |
| HOME_ASSISTANT_URL | NONE | Home Assistant Instance URL |
# Development
@ -74,28 +155,9 @@ python3 -m venv venv
# Local Development
pip install -e .
# Creds
# Creds & Other Environment Variables
export OPENAI_API_KEY=`cat openai_key`
# Docker
make docker_build_local
```
# Notes
This is the first time I'm doing anything LLM related, so it was an adventure.
Initially I was entertaining OpenAI's Embedding API with plans to load embeddings
into Pinecone, however initial calculations with `tiktoken` showed that generating
embeddings would cost roughly $250 USD.
Fortunately I found [Chroma](https://www.trychroma.com/), which basically solved
both of those issues. It allowed me to load in the normalized data and automatically
generated embeddings for me.
In order to fit into OpenAI ChatGPT's token limit, I limited each document to roughly
1000 words. I wanted to make sure I could add the top two matches as context while
still having enough headroom for the actual question from the user.
A few notes:
- Context is not carried over from previous messages
- I "stole" the prompt that is used in LangChain (See `oai.py`). I tried some variations without much (subjective) improvement.
- A generalized normalizer format. This should make it fairly easy to use completely different data. Just add a new normalizer that implements the super class.
- Basic web front end with TailwindCSS

View File

@ -1,8 +1,9 @@
from os import path
import click
import signal
import sys
from importlib.metadata import version
from minyma.config import Config
from minyma.plugin import PluginLoader
from minyma.oai import OpenAIConnector
from minyma.vdb import ChromaDB
from flask import Flask
@ -15,14 +16,15 @@ def signal_handler(sig, frame):
def create_app():
global oai, cdb
global oai, plugins
from minyma.config import Config
import minyma.api.common as api_common
import minyma.api.v1 as api_v1
app = Flask(__name__)
cdb = ChromaDB(Config.DATA_PATH)
oai = OpenAIConnector(Config.OPENAI_API_KEY, cdb)
oai = OpenAIConnector(Config.OPENAI_API_KEY)
plugins = PluginLoader(Config)
app.register_blueprint(api_common.bp)
app.register_blueprint(api_v1.bp)
@ -67,7 +69,7 @@ def normalize(filename, normalizer, database, datapath):
return print("INVALID NORMALIZER:", normalizer)
# Process Data
vdb.load_documents(norm)
vdb.load_documents(norm.name, norm)
signal.signal(signal.SIGINT, signal_handler)

View File

@ -17,22 +17,19 @@ def get_response():
if message == "":
return {"error": "Empty Message"}
oai_response = minyma.oai.query(message)
return oai_response
resp = minyma.oai.query(message)
# Return Data
return resp
"""
Return the raw vector db related response
TODO - Embeds and loads data into the local ChromaDB.
{
"input": "string",
"normalizer": "string",
}
"""
@bp.route("/related", methods=["POST"])
def get_related():
data = request.get_json()
if not data:
return {"error": "Missing Message"}
message = str(data.get("message"))
if message == "":
return {"error": "Empty Message"}
related_documents = minyma.cdb.get_related(message)
return related_documents
bp.route("/embed", methods=["POST"])
def post_embeddings():
pass

View File

@ -1,11 +1,12 @@
import os
def get_env(key, default=None, required=False) -> str:
def get_env(key, default=None, required=False) -> str | None:
"""Wrapper for gathering env vars."""
if required:
assert key in os.environ, "Missing Environment Variable: %s" % key
return str(os.environ.get(key, default))
env = os.environ.get(key, default)
return str(env) if env is not None else None
class Config:
@ -19,5 +20,7 @@ class Config:
OpenAI API Key - Required
"""
DATA_PATH: str = get_env("DATA_PATH", default="./data")
OPENAI_API_KEY: str = get_env("OPENAI_API_KEY", required=True)
CHROMA_DATA_PATH: str | None = get_env("CHROMA_DATA_PATH", required=False)
HOME_ASSISTANT_API_KEY: str | None = get_env("HOME_ASSISTANT_API_KEY", required=False)
HOME_ASSISTANT_URL: str | None = get_env("HOME_ASSISTANT_URL", required=False)
OPENAI_API_KEY: str | None = get_env("OPENAI_API_KEY", required=True)

View File

@ -1,12 +1,16 @@
from io import TextIOWrapper
import json
class DataNormalizer:
class DataNormalizer():
def __init__(self, file: TextIOWrapper):
pass
self.file = file
def __len__(self) -> int:
return 0
def __iter__(self):
pass
yield None
class PubMedNormalizer(DataNormalizer):
"""
@ -14,7 +18,15 @@ class PubMedNormalizer(DataNormalizer):
normalized inside the iterator.
"""
def __init__(self, file: TextIOWrapper):
self.file = file
self.name = "pubmed"
self.file = file
self.length = 0
def __len__(self):
last_pos = self.file.tell()
self.length = sum(1 for _ in self.file)
self.file.seek(last_pos)
return self.length
def __iter__(self):
count = 0
@ -42,4 +54,10 @@ class PubMedNormalizer(DataNormalizer):
count += 1
# ID = Line Number
yield { "doc": norm_text, "id": str(count - 1) }
yield {
"id": str(count - 1),
"doc": norm_text,
"metadata": {
"file": l.get("file")
},
}

View File

@ -1,44 +1,135 @@
from typing import Any
from dataclasses import dataclass
from textwrap import indent
from typing import Any, List
import json
import minyma
import openai
from minyma.vdb import VectorDB
INITIAL_PROMPT_TEMPLATE = """
You are connected to various functions that can be used to answer the users questions. Your options are only "functions". Functions should be an array of strings containing the desired function calls (e.g. "function_name()").
# Stolen LangChain Prompt
PROMPT_TEMPLATE = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to
make up an answer.
Available Functions:
{context}
{functions}
Question: {question}
Helpful Answer:
You must respond in JSON only with no other fluff or bad things will happen. The JSON keys must ONLY be "functions". Be sure to call the functions with the right arguments.
User Message: {message}
"""
FOLLOW_UP_PROMPT_TEMPLATE = """
You are a helpful assistant. This is a follow up message to provide you with more context on a previous user request. Only respond to the user using the following information:
{response}
User Message: {message}
"""
@dataclass
class ChatCompletion:
id: str
object: str
created: int
model: str
choices: List[dict]
usage: dict
class OpenAIConnector:
def __init__(self, api_key: str, vdb: VectorDB):
self.vdb = vdb
def __init__(self, api_key: str):
self.model = "gpt-3.5-turbo"
self.word_cap = 1000
openai.api_key = api_key
def query(self, question: str) -> Any:
# Get related documents from vector db
related = self.vdb.get_related(question)
# Validate results
all_docs = related.get("docs", [])
if len(all_docs) == 0:
return { "error": "No Context Found" }
def query(self, message: str) -> Any:
# Track Usage
prompt_tokens = 0
completion_tokens = 0
total_tokens = 0
# Join on new line, generate main prompt
context = '\n'.join(all_docs)
prompt = PROMPT_TEMPLATE.format(context = context, question = question)
# Get Available Functions
functions = "\n".join(list(map(lambda x: "- %s" % x["def"], minyma.plugins.plugin_defs().values())))
# Query OpenAI ChatCompletion
response = openai.ChatCompletion.create(
# Create Initial Prompt
prompt = INITIAL_PROMPT_TEMPLATE.format(message = message, functions = indent(functions, ' ' * 2))
messages = [{"role": "user", "content": prompt}]
print("[OpenAIConnector] Running Initial OAI Query")
# Run Initial
response: ChatCompletion = openai.ChatCompletion.create( # type: ignore
model=self.model,
messages=[{"role": "user", "content": prompt}]
messages=messages
)
if len(response.choices) == 0:
print("[OpenAIConnector] No Results -> TODO", response)
content = response.choices[0]["message"]["content"]
all_funcs = json.loads(content).get("functions")
# Update Usage
prompt_tokens += response.usage.get("prompt_tokens", 0)
completion_tokens += response.usage.get("completion_tokens", 0)
total_tokens += response.usage.get("prompt_tokens", 0)
print("[OpenAIConnector] Completed Initial OAI Query:\n", indent(json.dumps({ "usage": response.usage, "function_calls": all_funcs }, indent=2), ' ' * 2))
# Build Response Text & Metadata
func_metadata = {}
func_response = []
for func in all_funcs:
# Execute Requested Function
resp = minyma.plugins.execute(func)
# Unknown Response
if resp is None:
print("[OpenAIConnector] Invalid Function Response: %s" % func)
continue
# Get Response
content = resp.get("content")
metadata = resp.get("metadata")
error = resp.get("error")
# Append Responses & Metadata
indented_val = indent(content or error or "Unknown Error", ' ' * 2)
func_response.append("- %s\n%s" % (func, indented_val))
func_metadata[func] = { "metadata": metadata, "error": error }
func_response = "\n".join(func_response)
# Create Follow Up Prompt
prompt = FOLLOW_UP_PROMPT_TEMPLATE.format(message = message, response = func_response)
messages = [{"role": "user", "content": prompt}]
print("[OpenAIConnector] Running Follup Up OAI Query")
# Run Follow Up
response: ChatCompletion = openai.ChatCompletion.create( # type: ignore
model=self.model,
messages=messages
)
# Update Usage
prompt_tokens += response.usage.get("prompt_tokens", 0)
completion_tokens += response.usage.get("completion_tokens", 0)
total_tokens += response.usage.get("prompt_tokens", 0)
print("[OpenAIConnector] Completed Follup Up OAI Query:\n", indent(json.dumps({ "usage": response.usage }, indent=2), ' ' * 2))
# Get Content
content = response.choices[0]["message"]["content"]
# Return Response
return response
return {
"response": content,
"functions": func_metadata,
"usage": {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": total_tokens
}
}

98
minyma/plugin.py Normal file
View File

@ -0,0 +1,98 @@
import re
import inspect
import os
import importlib.util
class MinymaPlugin:
pass
class PluginLoader:
def __init__(self, config):
self.config = config
self.plugins = self.get_plugins()
self.definitions = self.plugin_defs()
def execute(self, func_cmd):
print("[PluginLoader] Execute Function:", func_cmd)
pattern = r'([a-z_]+)\('
func_name_search = re.search(pattern, func_cmd)
if not func_name_search:
return
func_name = func_name_search.group(1)
# Not Safe
if func_name in self.definitions:
args = re.sub(pattern, '(', func_cmd)
func = self.definitions[func_name]["func"]
return eval("func%s" % args)
def plugin_defs(self):
defs = {}
for plugin in self.plugins:
plugin_name = plugin.name
for func_obj in plugin.functions:
func_name = func_obj.__name__
signature = inspect.signature(func_obj)
params = list(
map(
lambda x: "%s: %s" % (x.name, x.annotation.__name__),
signature.parameters.values()
)
)
if func_name in defs:
print("[PluginLoader] Error: Duplicate Function: (%s) %s" % (plugin_name, func_name))
continue
func_def = "%s(%s)" % (func_name, ", ".join(params))
defs[func_name] = { "func": func_obj, "def": func_def }
return defs
def get_plugins(self):
"""Dynamically load plugins"""
# Derive Plugin Folder
loader_dir = os.path.dirname(os.path.abspath(__file__))
plugin_folder = os.path.join(loader_dir, "plugins")
# Find Minyma Plugins
plugin_classes = []
for filename in os.listdir(plugin_folder):
# Exclude Files
if not filename.endswith(".py") or filename == "__init__.py":
continue
# Derive Module Path
module_name = os.path.splitext(filename)[0]
module_path = os.path.join(plugin_folder, filename)
# Load Module Dynamically
spec = importlib.util.spec_from_file_location(module_name, module_path)
if spec is None or spec.loader is None:
raise ImportError("Unable to dynamically load plugin - %s" % filename)
# Load & Exec Module
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Only Process MinymaPlugin SubClasses
for _, member in inspect.getmembers(module):
if inspect.isclass(member) and issubclass(member, MinymaPlugin) and member != MinymaPlugin:
plugin_classes.append(member)
# Instantiate Plugins
plugins = []
for cls in plugin_classes:
instance = cls(self.config)
print("[PluginLoader] %s - Loaded: %d Feature(s)" % (cls.__name__, len(instance.functions)))
plugins.append(instance)
return plugins

3
minyma/plugins/README.md Normal file
View File

@ -0,0 +1,3 @@
# Plugins
These are plugins that provide OpenAI with functions. Each plugin can define multiple plugins. The plugin loader will automatically derive the function definition. Each function will have the plugin name prepended.

View File

@ -0,0 +1,53 @@
from textwrap import indent
from minyma.plugin import MinymaPlugin
from minyma.vdb import ChromaDB
class ChromaDBPlugin(MinymaPlugin):
"""Perform Local VectorDB Lookup
ChromDB can access multiple "collections". You can add additional functions
here that just access a different collection (i.e. different data)
"""
def __init__(self, config):
self.name = "chroma_db"
self.config = config
self.word_cap = 1000
if config.CHROMA_DATA_PATH is None:
self.functions = []
else:
self.vdb = ChromaDB(config.CHROMA_DATA_PATH)
self.functions = [self.lookup_pubmed_data]
def __lookup_data(self, collection_name: str, query: str):
# Get Related
related = self.vdb.get_related(collection_name, query)
# Get Metadata
metadata = [{
"id": related.get("ids")[i],
"distance": related.get("distances")[i],
"metadata": related.get("metadatas")[i],
} for i, _ in enumerate(related.get("docs", []))]
# Normalize Data
return list(
map(
lambda x: " ".join(x.split()[:self.word_cap]),
related.get("docs", [])
)
), metadata
def lookup_pubmed_data(self, query: str):
COLLECTION_NAME = "pubmed"
documents, metadata = self.__lookup_data(COLLECTION_NAME, query)
context = '\n'.join(documents)
return {
"content": context,
"metadata": metadata,
"error": None
}

View File

@ -0,0 +1,49 @@
import json
import requests
from bs4 import BeautifulSoup
from minyma.plugin import MinymaPlugin
HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:105.0)"
" Gecko/20100101 Firefox/105.0",
}
class DuckDuckGoPlugin(MinymaPlugin):
"""Search DuckDuckGo"""
def __init__(self, config):
self.config = config
self.name = "duck_duck_go"
self.functions = [self.search_duck_duck_go]
def search_duck_duck_go(self, query: str):
"""Search DuckDuckGo"""
resp = requests.get("https://html.duckduckgo.com/html/?q=%s" % query, headers=HEADERS)
soup = BeautifulSoup(resp.text, features="html.parser")
# Get Results
results = []
for item in soup.select(".result > div"):
title_el = item.select_one(".result__title > a")
title = title_el.text.strip() if title_el and title_el.text is not None else ""
description_el = item.select_one(".result__snippet")
description = description_el.text.strip() if description_el and description_el.text is not None else ""
results.append({"title": title, "description": description})
# Derive Metadata (Title)
metadata = {
"titles": list(
map(
lambda x: x.get("title"),
results[:5]
)
)
}
return {
"content": json.dumps(results[:5]),
"metadata": metadata,
"error": None
}

View File

@ -0,0 +1,47 @@
import json
import urllib.parse
import requests
from minyma.plugin import MinymaPlugin
class HomeAssistantPlugin(MinymaPlugin):
"""Perform Home Assistant Command"""
def __init__(self, config):
self.config = config
self.name = "home_assistant"
self.functions = []
if config.HOME_ASSISTANT_API_KEY and config.HOME_ASSISTANT_URL:
self.functions = [self.home_automation_command]
if not config.HOME_ASSISTANT_API_KEY:
print("[HomeAssistantPlugin] Missing HOME_ASSISTANT_API_KEY")
if not config.HOME_ASSISTANT_URL:
print("[HomeAssistantPlugin] Missing HOME_ASSISTANT_URL")
def home_automation_command(self, natural_language_command: str):
url = urllib.parse.urljoin(self.config.HOME_ASSISTANT_URL, "/api/conversation/process")
headers = {
"Authorization": "Bearer %s" % self.config.HOME_ASSISTANT_API_KEY,
"Content-Type": "application/json",
}
data = {"text": natural_language_command, "language": "en"}
resp = requests.post(url, json=data, headers=headers)
# Parse JSON
try:
r = resp.json()
text = r["response"]["speech"]["plain"]["speech"]
return {
"content": text,
"metadata": r,
"error": None
}
except requests.JSONDecodeError:
return {
"content": None,
"metadata": None,
"error": "Command Failed"
}

View File

@ -0,0 +1,98 @@
import json
import requests
from bs4 import BeautifulSoup
from minyma.plugin import MinymaPlugin
HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:105.0)"
" Gecko/20100101 Firefox/105.0",
}
class VehicleLookupPlugin(MinymaPlugin):
"""Search Vehicle Information"""
def __init__(self, config):
self.config = config
self.name = "vehicle_state_plate"
self.functions = [self.lookup_vehicle_by_state_plate]
def __query_api(self, url, json=None, headers=None):
# Perform Request
if json is not None:
resp = requests.post(url, json=json, headers=headers)
else:
resp = requests.get(url, headers=headers)
# Parse Text
text = resp.text.strip()
# Parse JSON
try:
json = resp.json()
return json, text, None
except requests.JSONDecodeError:
error = None
if resp.status_code != 200:
error = "Invalid HTTP Response: %s" % resp.status_code
else:
error = "Invalid JSON"
return None, text, error
def lookup_vehicle_by_state_plate(self, state_abbreviation: str, licence_plate: str):
CARVANA_URL = (
"https://apim.carvana.io/trades/api/v5/vehicleconfiguration/plateLookup/%s/%s"
% (state_abbreviation, licence_plate)
)
# Query API
json_resp, text_resp, error = self.__query_api(CARVANA_URL, headers=HEADERS)
# Invalid JSON
if json_resp is None:
return{
"content": None,
"metadata": text_resp,
"error": error,
}
try:
# Check Result
status_resp = json_resp.get("status", "Unknown")
if status_resp != "Succeeded":
if status_resp == "MissingResource":
error = "No Results"
else:
error = "API Error: %s" % status_resp
return {
"content": None,
"metadata": json_resp,
"error": error,
}
# Parse Result
vehicle_info = json_resp.get("content")
vin = vehicle_info.get("vin")
year = vehicle_info.get("vehicles")[0].get("year")
make = vehicle_info.get("vehicles")[0].get("make")
model = vehicle_info.get("vehicles")[0].get("model")
trim = vehicle_info.get("vehicles")[0].get("trim")
except Exception as e:
return {
"content": None,
"metadata": text_resp,
"error": "Unknown Error: %s" % e,
}
return {
"content": json.dumps({
"vin": vin,
"year": year,
"make": make,
"model": model,
"trim": trim,
}),
"metadata": json_resp,
"error": None
}

53
minyma/plugins/youtube.py Normal file
View File

@ -0,0 +1,53 @@
import os
from yt_dlp import YoutubeDL
import xml.etree.ElementTree as ET
from minyma.plugin import MinymaPlugin
class YouTubePlugin(MinymaPlugin):
"""Transcribe YouTube Video"""
def __init__(self, config):
self.config = config
self.name = "youtube"
self.functions = [self.transcribe_youtube]
def transcribe_youtube(self, youtube_video_id: str):
URLS = [youtube_video_id]
vid = YoutubeDL({
"skip_download": True,
"writesubtitles": True,
"writeautomaticsub": True,
"subtitleslangs": ["en"],
"subtitlesformat": "ttml",
"outtmpl": "transcript"
})
vid.download(URLS)
content = self.convert_ttml_to_plain_text("transcript.en.ttml")
os.remove("transcript.en.ttml")
return {
"content": content,
"metadata": URLS,
"error": "TTML Conversion Error" if content is None else None
}
def convert_ttml_to_plain_text(self, ttml_file_path):
try:
# Parse the TTML file
tree = ET.parse(ttml_file_path)
root = tree.getroot()
# Process Text
plain_text = ""
for elem in root.iter():
if elem.text:
plain_text += elem.text + " "
return plain_text.strip()
except ET.ParseError as e:
print("[YouTubePlugin] TTML Conversion Error:", e)
return None

View File

@ -2,10 +2,14 @@
<html lang="en">
<head>
<meta charset="utf-8" />
<meta
name="viewport"
content="width=device-width, initial-scale=0.9, user-scalable=no, viewport-fit=cover"
/>
<title>Minyma - Chat</title>
<script src="https://cdn.tailwindcss.com"></script>
</head>
<body class="bg-slate-900 h-screen p-5 flex flex-col justify-between">
<body class="bg-slate-900 h-[100dvh] p-5 flex flex-col justify-between">
<header class="w-full">
<svg
preserveAspectRatio="xMidYMid meet"
@ -55,7 +59,7 @@
</svg>
</header>
<main
class="flex flex-col justify-between w-11/12 mx-auto bg-slate-700 text-gray-300 rounded p-2 gap-4 h-full"
class="flex flex-col justify-between w-11/12 mx-auto bg-slate-700 text-gray-300 rounded p-2 gap-4 h-full overflow-scroll"
>
<div
id="messages"
@ -68,41 +72,41 @@
</main>
<script>
const LOADING_SVG = `<svg
width="24"
height="24"
viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"
fill="currentColor"
>
<style>
.spinner_qM83 {
animation: spinner_8HQG 1.05s infinite;
}
.spinner_oXPr {
animation-delay: 0.1s;
}
.spinner_ZTLf {
animation-delay: 0.2s;
}
@keyframes spinner_8HQG {
0%,
57.14% {
animation-timing-function: cubic-bezier(0.33, 0.66, 0.66, 1);
transform: translate(0);
}
28.57% {
animation-timing-function: cubic-bezier(0.33, 0, 0.66, 0.33);
transform: translateY(-6px);
}
100% {
transform: translate(0);
}
}
</style>
<circle class="spinner_qM83" cx="4" cy="12" r="3"></circle>
<circle class="spinner_qM83 spinner_oXPr" cx="12" cy="12" r="3"></circle>
<circle class="spinner_qM83 spinner_ZTLf" cx="20" cy="12" r="3"></circle>
</svg>`;
width="24"
height="24"
viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"
fill="currentColor"
>
<style>
.spinner_qM83 {
animation: spinner_8HQG 1.05s infinite;
}
.spinner_oXPr {
animation-delay: 0.1s;
}
.spinner_ZTLf {
animation-delay: 0.2s;
}
@keyframes spinner_8HQG {
0%,
57.14% {
animation-timing-function: cubic-bezier(0.33, 0.66, 0.66, 1);
transform: translate(0);
}
28.57% {
animation-timing-function: cubic-bezier(0.33, 0, 0.66, 0.33);
transform: translateY(-6px);
}
100% {
transform: translate(0);
}
}
</style>
<circle class="spinner_qM83" cx="4" cy="12" r="3"></circle>
<circle class="spinner_qM83 spinner_oXPr" cx="12" cy="12" r="3"></circle>
<circle class="spinner_qM83 spinner_ZTLf" cx="20" cy="12" r="3"></circle>
</svg>`;
/**
* Wrapper API Call
@ -121,9 +125,9 @@
// Wrapping Element
let wrapEl = document.createElement("div");
wrapEl.innerHTML = `<div class="flex">
<span class="font-bold w-24 grow-0 shrink-0"></span>
<span class="whitespace-break-spaces w-full"></span>
</div>`;
<span class="font-bold w-24 grow-0 shrink-0"></span>
<span class="whitespace-break-spaces w-full"></span>
</div>`;
// Get Elements
let nameEl = wrapEl.querySelector("span");
@ -154,7 +158,66 @@
})
.then((data) => {
console.log("SUCCESS:", data);
content.innerText = data.choices[0].message.content;
// Create Response Element
let responseEl = document.createElement("p");
responseEl.setAttribute(
"class",
"whitespace-break-spaces"
// "whitespace-break-spaces border-b pb-3 mb-3"
);
responseEl.innerText = data.response;
// Create Context Element
let contextEl = document.createElement("div");
contextEl.innerHTML = `
<h1 class="font-bold">Context:</h1>
<ul class="list-disc ml-6"></ul>`;
let ulEl = contextEl.querySelector("ul");
/*
// Create Context Links
data.context
// Capture PubMed ID & Distance
.map((item) => [
item.metadata.file.match("\/(.*)\.txt$"),
item.distance,
])
// Filter Non-Matches
.filter(([match]) => match)
// Get Match Value & Round Distance (2)
.map(([match, distance]) => [
match[1],
Math.round(distance * 100) / 100,
])
// Create Links
.forEach(([pmid, distance]) => {
let newEl = document.createElement("li");
let linkEl = document.createElement("a");
linkEl.setAttribute("target", "_blank");
linkEl.setAttribute(
"class",
"text-blue-500 hover:text-blue-600"
);
linkEl.setAttribute(
"href",
"https://www.ncbi.nlm.nih.gov/pmc/articles/" + pmid
);
linkEl.textContent = "[" + distance + "] " + pmid;
newEl.append(linkEl);
ulEl.append(newEl);
});
*/
// Add to DOM
content.setAttribute("class", "w-full");
content.innerHTML = "";
content.append(responseEl);
// content.append(contextEl);
})
.catch((e) => {
console.log("ERROR:", e);

View File

@ -1,7 +1,6 @@
from chromadb.api import API
from itertools import islice
from os import path
from tqdm.auto import tqdm
from tqdm import tqdm
from typing import Any, cast
import chromadb
@ -19,57 +18,60 @@ def chunk(iterable, chunk_size: int):
VectorDB Interface
"""
class VectorDB:
def load_documents(self, normalizer: DataNormalizer):
pass
def load_documents(self, name: str, normalizer: DataNormalizer, chunk_size: int = 10):
raise NotImplementedError("VectorDB must implement load_documents")
def get_related(self, question: str) -> Any:
pass
def get_related(self, name: str, question: str) -> Any:
raise NotImplementedError("VectorDB must implement get_related")
"""
ChromaDV VectorDB Type
"""
class ChromaDB(VectorDB):
def __init__(self, base_path: str):
chroma_path = path.join(base_path, "chroma")
self.client: API = chromadb.PersistentClient(path=chroma_path)
self.word_limit = 1000
self.collection_name: str = "vdb"
self.collection: chromadb.Collection = self.client.create_collection(name=self.collection_name, get_or_create=True)
def __init__(self, path: str):
self.client: API = chromadb.PersistentClient(path=path)
self.word_cap = 2500
def get_related(self, name: str, question: str) -> Any:
# Get or Create Collection
collection = chromadb.Collection = self.client.create_collection(name=name, get_or_create=True)
def get_related(self, question) -> Any:
"""Returns line separated related docs"""
results = self.collection.query(
query_texts=[question],
results = collection.query(
query_texts=[question.lower()],
n_results=2
)
all_docs: list = cast(list, results.get("documents", [[]]))[0]
all_metadata: list = cast(list, results.get("metadatas", [[]]))[0]
all_distances: list = cast(list, results.get("distances", [[]]))[0]
all_ids: list = cast(list, results.get("ids", [[]]))[0]
return {
"distances":all_distances,
"distances": all_distances,
"metadatas": all_metadata,
"docs": all_docs,
"ids": all_ids
}
def load_documents(self, normalizer: DataNormalizer):
# 10 Item Chunking
for items in tqdm(chunk(normalizer, 50)):
def load_documents(self, name: str, normalizer: DataNormalizer, chunk_size: int = 10):
# Get or Create Collection
collection = chromadb.Collection = self.client.create_collection(name=name, get_or_create=True)
# Load Items
length = len(normalizer) / chunk_size
for items in tqdm(chunk(normalizer, chunk_size), total=length):
ids = []
documents = []
metadatas = []
# Limit words per document to accommodate context token limits
for item in items:
doc = " ".join(item.get("doc").split()[:self.word_limit])
documents.append(doc)
documents.append(" ".join(item.get("doc").split()[:self.word_cap]))
ids.append(item.get("id"))
metadatas.append(item.get("metadata", {}))
# Ideally we parse out metadata from each document
# and pass to the metadata kwarg. However, each
# document appears to have a slightly different format,
# so it's difficult to parse out.
self.collection.add(
collection.add(
ids=ids,
documents=documents,
ids=ids
metadatas=metadatas,
)

View File

@ -15,7 +15,9 @@ dependencies = [
"tqdm",
"chromadb",
"sqlite-utils",
"click"
"click",
"beautifulsoup4",
"yt-dlp"
]
[project.scripts]

View File

@ -1,43 +0,0 @@
<svg
preserveAspectRatio="xMidYMid meet"
color-interpolation-filters="sRGB"
style="margin: auto"
height="80"
width="200"
viewBox="70 90 200 90"
>
<g fill="#ebb919" transform="translate(69.05000305175781,91.03400039672852)">
<g transform="translate(0,0)">
<g transform="scale(1)">
<g>
<path
d="M33.96-30.84L33.96-30.84Q36.48-30.84 38.37-29.88 40.26-28.92 41.46-27.24 42.66-25.56 43.26-23.34 43.86-21.12 43.86-18.54L43.86-18.54 43.86 0 36.66 0 36.66-18.54Q36.66-20.64 35.16-22.14L35.16-22.14Q33.72-23.64 31.56-23.64L31.56-23.64Q29.4-23.64 27.96-22.14L27.96-22.14Q26.46-20.64 26.46-18.54L26.46-18.54 26.46 0 19.26 0 19.26-18.54Q19.26-20.64 17.76-22.14L17.76-22.14Q17.04-22.92 16.11-23.28 15.18-23.64 14.16-23.64L14.16-23.64Q11.94-23.64 10.5-22.14L10.5-22.14Q9-20.64 9-18.54L9-18.54 9 0 1.8 0 1.8-30 9-30 9-27.36Q10.74-28.86 12.66-29.85 14.58-30.84 16.56-30.84L16.56-30.84Q19.26-30.84 21-29.76 22.74-28.68 24.12-26.76L24.12-26.76Q25.74-28.5 28.32-29.67 30.9-30.84 33.96-30.84ZM54.96 0L47.76 0 47.76-30 54.96-30 54.96 0ZM47.76-34.8L47.76-42 54.96-42 54.96-34.8 47.76-34.8ZM74.28-30.84L74.28-30.84Q77.22-30.84 79.62-29.73 82.02-28.62 83.73-26.67 85.44-24.72 86.37-22.14 87.3-19.56 87.3-16.62L87.3-16.62 87.3 0 80.1 0 80.1-16.62Q80.1-19.62 78-21.6L78-21.6Q75.96-23.64 73.08-23.64L73.08-23.64Q70.14-23.64 68.1-21.6L68.1-21.6Q66.06-19.56 66.06-16.62L66.06-16.62 66.06 0 58.86 0 58.86-30 66.06-30 66.06-27.72Q67.68-29.1 69.72-29.97 71.76-30.84 74.28-30.84ZM116.94-30L124.86-30 110.94 0 109.08 4.08Q107.4 7.74 104.04 9.9 100.68 12.06 96.6 12.06L96.6 12.06 93.42 12.06 95.22 4.86 96.96 4.86Q98.7 4.86 100.2 3.9 101.7 2.94 102.42 1.32L102.42 1.32 103.02 0 89.1-30 97.02-30 106.98-8.52 116.94-30ZM159.12-30.84L159.12-30.84Q161.64-30.84 163.53-29.88 165.42-28.92 166.62-27.24 167.82-25.56 168.42-23.34 169.02-21.12 169.02-18.54L169.02-18.54 169.02 0 161.82 0 161.82-18.54Q161.82-20.64 160.32-22.14L160.32-22.14Q158.88-23.64 156.72-23.64L156.72-23.64Q154.56-23.64 153.12-22.14L153.12-22.14Q151.62-20.64 151.62-18.54L151.62-18.54 151.62 0 144.42 0 144.42-18.54Q144.42-20.64 142.92-22.14L142.92-22.14Q142.2-22.92 141.27-23.28 140.34-23.64 139.32-23.64L139.32-23.64Q137.1-23.64 135.66-22.14L135.66-22.14Q134.16-20.64 134.16-18.54L134.16-18.54 134.16 0 126.96 0 126.96-30 134.16-30 134.16-27.36Q135.9-28.86 137.82-29.85 139.74-30.84 141.72-30.84L141.72-30.84Q144.42-30.84 146.16-29.76 147.9-28.68 149.28-26.76L149.28-26.76Q150.9-28.5 153.48-29.67 156.06-30.84 159.12-30.84ZM196.5-30.06L203.7-30.06 203.7 0 196.5 0 196.5-15Q196.5-18.6 193.98-21.12L193.98-21.12Q191.46-23.64 187.86-23.64L187.86-23.64Q186.12-23.64 184.53-22.98 182.94-22.32 181.74-21.12L181.74-21.12Q179.22-18.6 179.22-15L179.22-15Q179.22-11.46 181.74-8.94L181.74-8.94Q182.94-7.68 184.53-7.05 186.12-6.42 187.86-6.42L187.86-6.42Q189.66-6.42 191.1-7.02L191.1-7.02 193.68-0.6Q190.92 0.78 187.26 0.78L187.26 0.78Q183.96 0.78 181.17-0.45 178.38-1.68 176.34-3.84 174.3-6 173.16-8.88 172.02-11.76 172.02-15L172.02-15Q172.02-18.3 173.16-21.18 174.3-24.06 176.34-26.22 178.38-28.38 181.17-29.61 183.96-30.84 187.26-30.84L187.26-30.84Q190.2-30.84 192.48-29.94 194.76-29.04 196.5-27.66L196.5-27.66 196.5-30.06Z"
transform="translate(-1.7999999523162842, 42)"
></path>
</g>
</g>
</g>
<g fill="#ebb919" transform="translate(5,60.060001373291016)">
<rect
x="0"
height="1"
y="3.434999942779541"
width="88.66999673843384"
></rect>
<rect
height="1"
y="3.434999942779541"
width="88.66999673843384"
x="103.22999715805054"
></rect>
<g transform="translate(91.66999673843384,0)">
<g transform="scale(1)">
<path
d="M4.43-3.20L2.06-3.20L2.44-4.40C2.58-4.84 2.72-5.28 2.84-5.72C2.97-6.15 3.10-6.60 3.22-7.06L3.26-7.06C3.39-6.60 3.52-6.15 3.65-5.72C3.78-5.28 3.91-4.84 4.06-4.40ZM4.68-2.40L5.42 0L6.49 0L3.83-7.87L2.70-7.87L0.04 0L1.06 0L1.81-2.40ZM7.61-7.87L7.61 0L8.60 0L8.60-7.87Z"
transform="translate(-0.036000000000000004, 7.872)"
></path>
</g>
</g>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 4.1 KiB