initial commit

This commit is contained in:
2026-04-16 12:08:04 -04:00
commit f49c944efe
9 changed files with 1540 additions and 0 deletions

601
docs/indexeddb-format.md Normal file
View File

@@ -0,0 +1,601 @@
# Slack Desktop IndexedDB Data Format
This document describes the on-disk format and data structure of the Slack
desktop app's local IndexedDB cache. It covers the Mac App Store version
(`com.tinyspeck.slackmacgap`), though the Electron version uses the same
Chromium IndexedDB format at a different path.
> **Tooling**: All structures were analyzed using
> [`dfindexeddb`](https://pypi.org/project/dfindexeddb/) — a forensic Python
> library that parses Chromium IndexedDB / LevelDB files and Blink-serialized
> V8 values without native dependencies.
## Table of Contents
- [Filesystem Layout](#filesystem-layout)
- [LevelDB Layer](#leveldb-layer)
- [Files](#files)
- [Custom Comparator](#custom-comparator)
- [Record Types](#record-types)
- [IndexedDB Layer](#indexeddb-layer)
- [Databases](#databases)
- [Object Stores](#object-stores)
- [Blob Storage](#blob-storage)
- [Value Encoding](#value-encoding)
- [Blink IDB Value Wrapper](#blink-idb-value-wrapper)
- [V8 Serialization](#v8-serialization)
- [Sentinel Types](#sentinel-types)
- [JSArray Encoding](#jsarray-encoding)
- [Redux State Schema](#redux-state-schema)
- [Overview](#overview)
- [messages](#messages)
- [channels](#channels)
- [members](#members)
- [reactions](#reactions)
- [files](#files-1)
- [bots](#bots)
- [teams](#teams)
- [userGroups](#usergroups)
- [channelHistory](#channelhistory)
- [allThreads](#allthreads)
- [Other Stores](#other-stores)
- [Caveats & Limitations](#caveats--limitations)
---
## Filesystem Layout
### Mac App Store Version
```
~/Library/Containers/com.tinyspeck.slackmacgap/
Data/Library/Application Support/Slack/IndexedDB/
https_app.slack.com_0.indexeddb.leveldb/ # LevelDB database
000042.log # Write-ahead log (active writes)
000044.ldb # SSTable (compacted data)
CURRENT # Points to active MANIFEST
LOCK # Process lock file
LOG # LevelDB operational log
LOG.old # Previous operational log
MANIFEST-000001 # Database manifest (file versions, levels)
https_app.slack.com_0.indexeddb.blob/ # External blob storage
2/ # database_id=2
1e/ # Sharded directory (blob_number >> 8)
1e80 # Blob file (blob_number in hex)
```
### Electron Version (if installed)
```
~/Library/Application Support/Slack/IndexedDB/
https_app.slack.com_0.indexeddb.leveldb/
https_app.slack.com_0.indexeddb.blob/
```
### Other Platforms
| OS | Path |
| ------- | ------------------------------- |
| Linux | `~/.config/Slack/IndexedDB/...` |
| Windows | `%AppData%\Slack\IndexedDB\...` |
---
## LevelDB Layer
Chromium's IndexedDB is backed by LevelDB, a sorted key-value store.
### Files
| File Pattern | Purpose |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `*.log` | Write-ahead log. Contains recent, uncommitted writes as `WriteBatch` records. Each record has a 7-byte header: checksum (4), length (2), type (1). |
| `*.ldb` / `*.sst` | SSTables. Immutable, sorted, compressed (Snappy) data files produced by compaction. |
| `MANIFEST-*` | Tracks which files belong to which LSM-tree level, active file set. |
| `CURRENT` | Text file pointing to the active manifest (e.g., `MANIFEST-000001`). |
| `LOCK` | Advisory file lock. **Held by Slack while running.** |
| `LOG` | LevelDB's internal operational log (compaction events, etc.). |
### Custom Comparator
Chromium IndexedDB uses a custom LevelDB comparator called **`idb_cmp1`**
rather than the default `leveldb.BytewiseComparator`. This means:
- Standard LevelDB libraries (`plyvel`, `leveldb`) **cannot open** these
databases — they will fail with a comparator mismatch error.
- You must either use `dfindexeddb` (which parses the raw files without
opening the DB) or copy the data and parse at the binary level.
### Record Types
Each LevelDB key in the IndexedDB encodes a typed key prefix. The key types
observed in Slack's database:
| Record Type | Count | Description |
| ------------------------ | ------- | ----------------------------------------------- |
| `ScopesPrefixKey` | ~13,000 | Internal scope tracking records |
| `RecoveryBlobJournalKey` | ~2,300 | Blob lifecycle / garbage collection journal |
| `ObjectStoreDataKey` | ~1,600 | **Actual data records** (messages, state, etc.) |
| `ExistsEntryKey` | ~1,600 | Existence index (mirrors data keys) |
| `BlobEntryKey` | ~1,500 | Maps data keys to external blob references |
| `ObjectStoreMetaDataKey` | ~800 | Object store schema metadata |
| `DatabaseMetaDataKey` | ~770 | Database-level metadata (version, etc.) |
| `ActiveBlobJournalKey` | ~760 | Currently active blob journal |
| `DatabaseNameKey` | 3 | Maps database IDs to names |
| `ObjectStoreNamesKey` | 3 | Maps object store IDs to names |
| `SchemaVersionKey` | 1 | IndexedDB schema version |
| `MaxDatabaseIdKey` | 1 | Highest allocated database ID |
| `DataVersionKey` | 1 | Data format version |
The high counts for `ScopesPrefixKey`, `RecoveryBlobJournalKey`, and
`ActiveBlobJournalKey` reflect Slack's frequent Redux state persistence —
each save cycle creates a new blob and journals the old one for garbage
collection.
---
## IndexedDB Layer
### Databases
Three IndexedDB databases are present:
| `database_id` | Name | Purpose |
| ------------- | ------------------ | ---------------------------------------------------------------------------- |
| 2 | `reduxPersistence` | Slack's full Redux application state |
| 3 | _(unnamed)_ | Encrypted syncer data (e.g., `syncer.User/{hash}`, `syncer.Document/{hash}`) |
| 10 | _(unnamed)_ | Sundry metadata (e.g., `minChannelUpdated` timestamp) |
> **Note**: `database_id=0` is used for global IndexedDB metadata records
> (database names, schema version, etc.) and is not an application database.
### Object Stores
| `database_id` | `object_store_id` | Store Name | Key Pattern | Storage |
| ------------- | ----------------- | -------------------- | ------------------------------------------ | ------------------------------- |
| 2 | 1 | `reduxPersenceStore` | `persist:slack-client-{TEAM_ID}-{USER_ID}` | External blob (~4-7 MB) |
| 3 | 1 | `{hex-hash}` | `syncer.{Type}/{ID}` | Inline, **encrypted** (AES-GCM) |
| 10 | 1 | `sundryStorage` | `0` | Inline |
The Redux state store (db=2) contains a single key per team+user combination.
The entire application state is serialized into one large blob.
### Blob Storage
When an IndexedDB value exceeds the inline size threshold, Chromium stores it
as an external blob file. The blob path is derived from the blob number:
```
{blob_dir}/{database_id}/{blob_number >> 8 :02x}/{blob_number :04x}
```
For example, blob number `7808` (hex `0x1e80`) in database `2`:
```
https_app.slack.com_0.indexeddb.blob/2/1e/1e80
```
Blobs are **versioned** — each Redux persist cycle allocates a new blob number
and the previous blob is journaled for deletion. Only the latest blob contains
the current state.
---
## Value Encoding
### Blink IDB Value Wrapper
Chromium wraps IndexedDB values in a Blink-specific envelope with MIME type
`application/vnd.blink-idb-value-wrapper`. The blob file begins with a
3-byte header:
| Offset | Value | Meaning |
| ------ | ------ | ------------------------------------- |
| 0 | `0xFF` | Blink serialization tag: VERSION |
| 1 | `0x11` | Pseudo-version: "requires processing" |
| 2 | `0x02` | Compression: Snappy |
After the header, the remaining bytes are **Snappy-compressed** V8 serialized
data.
### V8 Serialization
The decompressed data uses Chrome's V8 serialization format — the same binary
format used by `structuredClone()` and `postMessage()`. It encodes JavaScript
values including:
- Primitives: `string`, `number`, `boolean`, `null`, `undefined`
- Objects: `{}` → Python `dict`
- Arrays: `[]` → Python `JSArray` (see below)
- Typed arrays, `Date`, `RegExp`, `Map`, `Set`, `ArrayBuffer`, etc.
`dfindexeddb` deserializes this into Python-native types with a few special
sentinel objects.
### Sentinel Types
`dfindexeddb` represents JavaScript values that have no Python equivalent
using sentinel objects:
| JS Value | dfindexeddb Type | Python `repr` | Notes |
| ----------- | ---------------- | ------------- | ----------------------------------------------------------------------------------- |
| `undefined` | `Undefined` | `Undefined()` | Distinct from `null`. Common on optional message fields (e.g., `subtype`, `files`). |
| `null` | `Null` | `Null()` | Used where Slack explicitly sets `null`. |
| `NaN` | `NaN` | `NaN()` | Rare. |
**Important**: When checking fields, always handle these types. A message's
`subtype` field is `Undefined()` (not `None`, not missing) when no subtype
applies:
```python
subtype = msg.get("subtype", "")
if not isinstance(subtype, str):
subtype = "" # Was Undefined() or Null()
```
### JSArray Encoding
JavaScript sparse arrays are encoded as `JSArray` objects with two attributes:
- **`values`**: A Python list of positional values. Sparse positions are
`Undefined()`.
- **`properties`**: A Python dict mapping string indices to the actual values.
```python
# JS: ["alice", "bob", "carol"]
# Python:
JSArray(
values=[Undefined(), Undefined(), Undefined()],
properties={0: "alice", 1: "bob", 2: "carol"}
)
```
To iterate a `JSArray` as a flat list:
```python
def jsarray_to_list(arr):
if hasattr(arr, "properties"):
return [arr.properties.get(i) for i in range(len(arr.values))]
return arr # Already a plain list
```
---
## Redux State Schema
### Overview
The Redux state blob contains Slack's entire client-side application state.
It is a single large JavaScript object with ~140 top-level keys. The largest
stores (by serialized size):
| Key | Size | Entries | Description |
| ---------------- | ------- | ------------- | ----------------------------------------- |
| `messages` | ~44 MB | ~295 channels | Cached message history |
| `channels` | ~1.3 MB | ~583 | Channel metadata |
| `files` | ~1.2 MB | ~267 | File/upload metadata |
| `channelHistory` | ~800 KB | ~1,200 | Pagination / scroll state per channel |
| `members` | ~730 KB | ~351 | User profiles |
| `experiments` | ~310 KB | ~1,527 | Feature flag experiments |
| `reactions` | ~290 KB | ~955 | Emoji reactions on messages |
| `apps` | ~280 KB | ~29 | Installed Slack app metadata |
| `prefs` | ~170 KB | 4 | User preferences (huge, hundreds of keys) |
| `userPrefs` | ~156 KB | ~667 | Additional user preference data |
| `threadSub` | ~135 KB | ~1,120 | Thread subscription state |
---
### messages
**Path**: `state.messages[channel_id][timestamp]`
The primary message store. Keyed by channel ID, then by message timestamp.
```
messages: {
"C0XXXXXXXXX": { # Channel ID
"1776115292.356529": { ... }, # Message (ts is the key)
"1776117909.325989": { ... },
...
},
...
}
```
#### Message Fields
| Field | Type | Description |
| ------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------ |
| `ts` | `str` | Message timestamp (unique ID). Unix epoch with microseconds as decimal. |
| `type` | `str` | Always `"message"`. |
| `text` | `str` | Message text content. Contains Slack markup: `<@U123>` for mentions, `<url\|label>` for links. |
| `user` | `str` | User ID of sender (e.g., `"U0XXXXXXXXX"`). |
| `channel` | `str` | Channel ID (e.g., `"C0XXXXXXXXX"`). |
| `subtype` | `str` \| `Undefined` | Message subtype: `"channel_join"`, `"bot_message"`, etc. `Undefined()` for normal messages. |
| `thread_ts` | `str` \| `Undefined` | Parent thread timestamp. Same as `ts` for thread parent messages. `Undefined()` for non-threaded. |
| `reply_count` | `int` | Number of replies (0 for non-parent messages). |
| `reply_users` | `JSArray` \| `Undefined` | User IDs of thread participants. |
| `reply_users_count` | `int` \| `Undefined` | Count of unique repliers. |
| `latest_reply` | `str` \| `Undefined` | Timestamp of latest reply. |
| `_hidden_reply` | `bool` | `True` if this is a thread reply not shown in the channel. |
| `blocks` | `JSArray` \| `Undefined` | Slack Block Kit elements (rich text, sections, images, etc.). |
| `files` | `JSArray` \| `Undefined` | File IDs attached to this message. Values in `properties` are file ID strings (not full file objects). |
| `attachments` | `JSArray` \| `Undefined` | Legacy attachments (links, bot attachments). |
| `client_msg_id` | `str` | Client-generated UUID for the message. |
| `no_display` | `bool` | Whether to hide this message in UI. |
| `_rxn_key` | `str` | Key for looking up reactions: `"message-{ts}-{channel}"`. |
| `slackbot_feels` | `Null` | Slackbot sentiment (always `Null()` in practice). |
| `__meta__` | `dict` | Internal cache metadata: `{"lastUpdatedTs": "..."}`. |
| `parent_user_id` | `str` | User ID of the thread parent author (only on replies). |
| `upload` | `bool` | Present and `True` on file upload messages. |
---
### channels
**Path**: `state.channels[channel_id]`
Channel metadata. Includes public channels, private channels, DMs, and MPDMs.
#### Channel Fields
| Field | Type | Description |
| ---------------------- | ------- | --------------------------------------------------- |
| `id` | `str` | Channel ID (e.g., `"C0XXXXXXXXX"`). |
| `name` | `str` | Channel display name. |
| `name_normalized` | `str` | Lowercase normalized name. |
| `is_channel` | `bool` | Public channel. |
| `is_group` | `bool` | Private channel (legacy term). |
| `is_im` | `bool` | Direct message. |
| `is_mpim` | `bool` | Multi-party direct message. |
| `is_private` | `bool` | Private (group or MPIM). |
| `is_archived` | `bool` | Channel is archived. |
| `is_general` | `bool` | The `#general` channel. |
| `is_member` | `bool` | Current user is a member. |
| `created` | `float` | Unix timestamp of channel creation. |
| `creator` | `str` | User ID of channel creator. |
| `context_team_id` | `str` | Team ID this channel belongs to. |
| `topic` | `dict` | `{"value": "...", "creator": "...", "last_set": 0}` |
| `purpose` | `dict` | `{"value": "...", "creator": "...", "last_set": 0}` |
| `unread_cnt` | `int` | Unread message count. |
| `unread_highlight_cnt` | `int` | Unread mentions/highlights count. |
| `is_ext_shared` | `bool` | Slack Connect shared channel. |
| `is_org_shared` | `bool` | Shared across org workspaces. |
| `is_frozen` | `bool` | Channel is frozen (read-only). |
_Plus ~30 additional boolean flags and UI state fields._
---
### members
**Path**: `state.members[user_id]`
User profiles for all visible workspace members.
#### Member Fields
| Field | Type | Description |
| --------------------- | ------- | -------------------------------------------------------------------------------------------------- |
| `id` | `str` | User ID (e.g., `"U0XXXXXXXXX"`). |
| `team_id` | `str` | Primary team ID. |
| `name` | `str` | Username (login name). |
| `real_name` | `str` | Full display name. |
| `deleted` | `bool` | Account deactivated. |
| `color` | `str` | Hex color assigned to user. |
| `tz` | `str` | Timezone identifier (e.g., `"America/New_York"`). |
| `tz_label` | `str` | Human-readable timezone name. |
| `tz_offset` | `int` | UTC offset in seconds. |
| `profile` | `dict` | Nested profile with `title`, `phone`, `email`, `image_*` URLs, `status_text`, `status_emoji`, etc. |
| `is_admin` | `bool` | Workspace admin. |
| `is_owner` | `bool` | Workspace owner. |
| `is_bot` | `bool` | Bot account. |
| `is_app_user` | `bool` | App-associated user. |
| `is_restricted` | `bool` | Guest (single-channel or multi-channel). |
| `is_ultra_restricted` | `bool` | Single-channel guest. |
| `updated` | `float` | Last profile update timestamp. |
| `is_self` | `bool` | `True` for the current logged-in user. |
_Plus `_name_lc`, `_display_name_lc`, etc. for search/sorting._
---
### reactions
**Path**: `state.reactions[reaction_key]`
Keyed by `"message-{ts}-{channel_id}"` (matching the `_rxn_key` field on
messages).
Each value is a `JSArray` of reaction objects:
```python
{
"name": "eyes", # Emoji name
"baseName": "eyes", # Base name (without skin tone)
"count": 2, # Total reaction count
"users": JSArray( # User IDs who reacted
values=[Undefined(), Undefined()],
properties={0: "U0XXXXXXXXX", 1: "U0YYYYYYYYY"}
)
}
```
---
### files
**Path**: `state.files[file_id]`
File metadata for files visible in the current session.
#### File Fields
| Field | Type | Description |
| ------------- | ------- | ----------------------------------------------------------------------------- |
| `id` | `str` | File ID (e.g., `"F0XXXXXXXXX"`). |
| `name` | `str` | Original filename. |
| `title` | `str` | Display title. |
| `mimetype` | `str` | MIME type (e.g., `"image/png"`). |
| `filetype` | `str` | Short type (e.g., `"png"`, `"pdf"`). |
| `size` | `int` | File size in bytes. |
| `user` | `str` | Uploader's user ID. |
| `created` | `float` | Upload timestamp. |
| `url_private` | `str` | Authenticated download URL. |
| `permalink` | `str` | Permanent link to file in Slack. |
| `thumb_*` | `str` | Thumbnail URLs at various sizes (64, 80, 160, 360, 480, 720, 800, 960, 1024). |
| `original_w` | `int` | Original image width. |
| `original_h` | `int` | Original image height. |
| `is_public` | `bool` | Shared to a public channel. |
| `is_external` | `bool` | External file (Google Drive, etc.). |
> **Note**: File URLs require Slack authentication to access. The `files`
> store in messages contains only file IDs (strings), not full file objects.
> Cross-reference with `state.files[file_id]` for metadata.
---
### bots
**Path**: `state.bots[bot_id]`
Bot user metadata.
| Field | Type | Description |
| --------- | ------- | ---------------------------------------------- |
| `id` | `str` | Bot ID (e.g., `"B0XXXXXXXXX"`). |
| `name` | `str` | Bot display name (e.g., `"MyBot"`). |
| `app_id` | `str` | Associated Slack app ID. |
| `user_id` | `str` | User ID associated with this bot. |
| `icons` | `dict` | Icon URLs: `image_36`, `image_48`, `image_72`. |
| `deleted` | `bool` | Bot is deactivated. |
| `updated` | `float` | Last update timestamp. |
| `team_id` | `str` | Team ID. |
---
### teams
**Path**: `state.teams[team_id]`
Workspace/org metadata.
| Field | Type | Description |
| -------------- | ------- | ---------------------------------------------------- |
| `id` | `str` | Team ID (e.g., `"T0XXXXXXXXX"`). |
| `name` | `str` | Workspace name. |
| `domain` | `str` | Slack subdomain. |
| `url` | `str` | Full workspace URL. |
| `email_domain` | `str` | Email domain for sign-up. |
| `plan` | `str` | Plan type (`"std"`, `"plus"`, `"enterprise"`, etc.). |
| `icon` | `dict` | Workspace icon URLs at various sizes. |
| `date_created` | `float` | Workspace creation timestamp. |
| `prefs` | `dict` | Workspace-level preferences (large, many keys). |
---
### userGroups
**Path**: `state.userGroups[group_id]`
User groups (e.g., `@engineering`, `@design`).
| Field | Type | Description |
| ------------- | --------- | ----------------------------------------------- |
| `id` | `str` | Group ID (e.g., `"S0XXXXXXXXX"`). |
| `name` | `str` | Display name. |
| `handle` | `str` | Mention handle (e.g., `"design"`). |
| `description` | `str` | Group description. |
| `user_count` | `int` | Number of members. |
| `users` | `JSArray` | Member user IDs. |
| `prefs` | `dict` | Contains `channels` JSArray (default channels). |
---
### channelHistory
**Path**: `state.channelHistory[channel_id]`
Pagination and fetch state for channel message history.
| Field | Type | Description |
| ---------------- | ---------------- | ----------------------------------------------------------------------- |
| `reachedStart` | `bool` | Scrolled to the very first message. |
| `reachedEnd` | `bool` \| `Null` | Scrolled to the latest message. |
| `prevReachedEnd` | `bool` | Previously reached end (before new messages arrived). |
| `slices` | `JSArray` | Loaded message timestamp ranges. Each slice has a `timestamps` JSArray. |
---
### allThreads
**Path**: `state.allThreads`
Thread view state (the "Threads" sidebar panel).
| Field | Type | Description |
| ------------- | --------- | ----------------------------------------------------------------------------------------------------------- |
| `threads` | `JSArray` | Thread summaries. Each property has `threadKey` (`"{channel}-{ts}"`), `sortTs`, `hasUnreads`, `isPriority`. |
| `hasMore` | `bool` | More threads available to load. |
| `cursorTs` | `str` | Pagination cursor. |
| `maxTs` | `str` | Most recent thread timestamp. |
| `selectedTab` | `str` | Active tab: `"all"` or `"unreads"`. |
---
### Other Stores
Stores not detailed above but present in the state:
| Key | Entries | Description |
| ------------------ | ------- | --------------------------------------------------------------------------------------------------------- |
| `experiments` | ~1,500 | Feature flags and A/B test assignments |
| `prefs` | 4 | User preferences — `user`, `team`, `client`, `features`. The `user` entry alone has 400+ preference keys. |
| `threadSub` | ~1,100 | Thread subscription state per channel+thread |
| `searchResults` | 1 | Last search query, results, and filters |
| `membership` | ~18 | Channel membership maps: `{user_id: {isKnown, isMember}}` |
| `membershipCounts` | ~97 | Channel member counts |
| `channelCursors` | ~256 | Read cursor positions per channel |
| `mutedChannels` | ~14 | Muted channel list |
| `unreadCounts` | ~18 | Unread count state per channel |
| `flannelEmoji` | ~548 | Custom workspace emoji definitions |
| `slashCommand` | 2 | Slash command definitions |
| `channelSections` | ~18 | Sidebar section organization |
| `bootData` | 67 | Initial boot data (team info, feature gates) |
---
## Caveats & Limitations
1. **Cache only** — Slack only caches **recently viewed** channels and
messages. The IndexedDB does not contain complete workspace history.
2. **Single blob** — The entire Redux state is one monolithic blob (~4-7 MB
compressed, ~90 MB JSON). There is no way to read individual channels
without decoding the whole thing.
3. **Lock file** — Slack holds the LevelDB `LOCK` file while running. To
read the data you must either:
- Copy the LevelDB + blob directories and remove the `LOCK` file from
the copy, or
- Parse the raw `.log` and `.ldb` files directly (which `dfindexeddb`
does).
4. **Blob rotation** — Slack persists state frequently. The blob file changes
every few seconds. Only the **latest** blob (highest modification time)
contains current data.
5. **Encrypted data** — Database 3 (object store name is a hex hash)
contains AES-GCM encrypted values (syncer data). The encryption key is
not stored in the IndexedDB and these records cannot be decrypted from
disk alone.
6. **File references** — Messages reference files by ID only (e.g.,
`"F0XXXXXXXXX"`), not by the full file object. Cross-reference with
`state.files[file_id]` for metadata and URLs.
7. **Slack markup** — Message `text` fields contain Slack's markup format:
- User mentions: `<@U0XXXXXXXXX>`
- Channel links: `<#C0XXXXXXXXX|general>`
- URLs: `<https://example.com|label>`
- Emoji: `:thumbsup:` (not Unicode)