minyma/README.md

# Usage

## Running Server

```bash
# Locally
minyma server run

# Docker Quick Start
make docker_build_local
docker run \
    -p 5000:5000 \
    -e OPENAI_API_KEY=`cat openai_key` \
    -e DATA_PATH=/data \
    -v ./data:/data \
    minyma:latest
```

The server will now be accessible at `http://localhost:5000`

## Normalizing & Loading Data

Minyma is designed to be extensible. You can add normalizers and vector db's
using the appropriate interfaces defined in `./minyma/normalizer.py` and
`./minyma/vdb.py`. At the moment the only supported database is `chroma`
and the only supported normalizer is the `pubmed` normalizer.

To normalize data, you can use Minyma's `normalize` CLI command:

```bash
minyma normalize --filename ./pubmed_manuscripts.jsonl --normalizer pubmed --database chroma --datapath ./chroma
```

The above example does the following:

- Uses the `pubmed` normalizer
- Normalizes the `./pubmed_manuscripts.jsonl` raw dataset [0]
- Loads the output into a `chroma` database and persists the data to the `./chroma` directory

**NOTE:** The above dataset took about an hour to normalize on my MPB M2 Max

[0] https://huggingface.co/datasets/TaylorAI/pubmed_author_manuscripts/tree/main

# Development

```bash
# Initiate
python3 -m venv venv
. ./venv/bin/activate

# Local Development
pip install -e .

# Creds
export OPENAI_API_KEY=`cat openai_key`
```
Initial Commit 2023-10-15 22:02:44 +00:00			`# Usage`

			`## Running Server`

			```bash
			`# Locally`
			`minyma server run`

			`# Docker Quick Start`
			`make docker_build_local`
			`docker run \`
			`-p 5000:5000 \`
			-e OPENAI_API_KEY=`cat openai_key` \
			`-e DATA_PATH=/data \`
			`-v ./data:/data \`
			`minyma:latest`
			```

			The server will now be accessible at `http://localhost:5000`

			`## Normalizing & Loading Data`

			`Minyma is designed to be extensible. You can add normalizers and vector db's`
			using the appropriate interfaces defined in `./minyma/normalizer.py` and
			`./minyma/vdb.py`. At the moment the only supported database is `chroma`
			and the only supported normalizer is the `pubmed` normalizer.

			To normalize data, you can use Minyma's `normalize` CLI command:

			```bash
			`minyma normalize --filename ./pubmed_manuscripts.jsonl --normalizer pubmed --database chroma --datapath ./chroma`
			```

			`The above example does the following:`

			- Uses the `pubmed` normalizer
			- Normalizes the `./pubmed_manuscripts.jsonl` raw dataset [0]
			- Loads the output into a `chroma` database and persists the data to the `./chroma` directory

			`NOTE: The above dataset took about an hour to normalize on my MPB M2 Max`

			`[0] https://huggingface.co/datasets/TaylorAI/pubmed_author_manuscripts/tree/main`

			`# Development`

			```bash
			`# Initiate`
			`python3 -m venv venv`
			`. ./venv/bin/activate`

			`# Local Development`
			`pip install -e .`

			`# Creds`
			export OPENAI_API_KEY=`cat openai_key`
			```