diff --git a/.claude/commands/add-connector.md b/.claude/commands/add-connector.md new file mode 100644 index 00000000000..4635211ffe7 --- /dev/null +++ b/.claude/commands/add-connector.md @@ -0,0 +1,437 @@ +--- +description: Add a knowledge base connector for syncing documents from an external source +argument-hint: [api-docs-url] +--- + +# Add Connector Skill + +You are an expert at adding knowledge base connectors to Sim. A connector syncs documents from an external source (Confluence, Google Drive, Notion, etc.) into a knowledge base. + +## Your Task + +When the user asks you to create a connector: +1. Use Context7 or WebFetch to read the service's API documentation +2. Determine the auth mode: **OAuth** (if Sim already has an OAuth provider for the service) or **API key** (if the service uses API key / Bearer token auth) +3. Create the connector directory and config +4. Register it in the connector registry + +## Directory Structure + +Create files in `apps/sim/connectors/{service}/`: +``` +connectors/{service}/ +├── index.ts # Barrel export +└── {service}.ts # ConnectorConfig definition +``` + +## Authentication + +Connectors use a discriminated union for auth config (`ConnectorAuthConfig` in `connectors/types.ts`): + +```typescript +type ConnectorAuthConfig = + | { mode: 'oauth'; provider: OAuthService; requiredScopes?: string[] } + | { mode: 'apiKey'; label?: string; placeholder?: string } +``` + +### OAuth mode +For services with existing OAuth providers in `apps/sim/lib/oauth/types.ts`. The `provider` must match an `OAuthService`. The modal shows a credential picker and handles token refresh automatically. + +### API key mode +For services that use API key / Bearer token auth. The modal shows a password input with the configured `label` and `placeholder`. The API key is encrypted at rest using AES-256-GCM and stored in a dedicated `encryptedApiKey` column on the connector record. The sync engine decrypts it automatically — connectors receive the raw access token in `listDocuments`, `getDocument`, and `validateConfig`. + +## ConnectorConfig Structure + +### OAuth connector example + +```typescript +import { createLogger } from '@sim/logger' +import { {Service}Icon } from '@/components/icons' +import { fetchWithRetry } from '@/lib/knowledge/documents/utils' +import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types' + +const logger = createLogger('{Service}Connector') + +export const {service}Connector: ConnectorConfig = { + id: '{service}', + name: '{Service}', + description: 'Sync documents from {Service} into your knowledge base', + version: '1.0.0', + icon: {Service}Icon, + + auth: { + mode: 'oauth', + provider: '{service}', // Must match OAuthService in lib/oauth/types.ts + requiredScopes: ['read:...'], + }, + + configFields: [ + // Rendered dynamically by the add-connector modal UI + // Supports 'short-input' and 'dropdown' types + ], + + listDocuments: async (accessToken, sourceConfig, cursor) => { + // Paginate via cursor, extract text, compute SHA-256 hash + // Return { documents: ExternalDocument[], nextCursor?, hasMore } + }, + + getDocument: async (accessToken, sourceConfig, externalId) => { + // Return ExternalDocument or null + }, + + validateConfig: async (accessToken, sourceConfig) => { + // Return { valid: true } or { valid: false, error: 'message' } + }, + + // Optional: map source metadata to semantic tag keys (translated to slots by sync engine) + mapTags: (metadata) => { + // Return Record with keys matching tagDefinitions[].id + }, +} +``` + +### API key connector example + +```typescript +export const {service}Connector: ConnectorConfig = { + id: '{service}', + name: '{Service}', + description: 'Sync documents from {Service} into your knowledge base', + version: '1.0.0', + icon: {Service}Icon, + + auth: { + mode: 'apiKey', + label: 'API Key', // Shown above the input field + placeholder: 'Enter your {Service} API key', // Input placeholder + }, + + configFields: [ /* ... */ ], + listDocuments: async (accessToken, sourceConfig, cursor) => { /* ... */ }, + getDocument: async (accessToken, sourceConfig, externalId) => { /* ... */ }, + validateConfig: async (accessToken, sourceConfig) => { /* ... */ }, +} +``` + +## ConfigField Types + +The add-connector modal renders these automatically — no custom UI needed. + +Three field types are supported: `short-input`, `dropdown`, and `selector`. + +```typescript +// Text input +{ + id: 'domain', + title: 'Domain', + type: 'short-input', + placeholder: 'yoursite.example.com', + required: true, +} + +// Dropdown (static options) +{ + id: 'contentType', + title: 'Content Type', + type: 'dropdown', + required: false, + options: [ + { label: 'Pages only', id: 'page' }, + { label: 'Blog posts only', id: 'blogpost' }, + { label: 'All content', id: 'all' }, + ], +} +``` + +## Dynamic Selectors (Canonical Pairs) + +Use `type: 'selector'` to fetch options dynamically from the existing selector registry (`hooks/selectors/registry.ts`). Selectors are always paired with a manual fallback input using the **canonical pair** pattern — a `selector` field (basic mode) and a `short-input` field (advanced mode) linked by `canonicalParamId`. + +The user sees a toggle button (ArrowLeftRight) to switch between the selector dropdown and manual text input. On submit, the modal resolves each canonical pair to the active mode's value, keyed by `canonicalParamId`. + +### Rules + +1. **Every selector field MUST have a canonical pair** — a corresponding `short-input` (or `dropdown`) field with the same `canonicalParamId` and `mode: 'advanced'`. +2. **`required` must be set identically on both fields** in a pair. If the selector is required, the manual input must also be required. +3. **`canonicalParamId` must match the key the connector expects in `sourceConfig`** (e.g. `baseId`, `channel`, `teamId`). The advanced field's `id` should typically match `canonicalParamId`. +4. **`dependsOn` references the selector field's `id`**, not the `canonicalParamId`. The modal propagates dependency clearing across canonical siblings automatically — changing either field in a parent pair clears dependent children. + +### Selector canonical pair example (Airtable base → table cascade) + +```typescript +configFields: [ + // Base: selector (basic) + manual (advanced) + { + id: 'baseSelector', + title: 'Base', + type: 'selector', + selectorKey: 'airtable.bases', // Must exist in hooks/selectors/registry.ts + canonicalParamId: 'baseId', + mode: 'basic', + placeholder: 'Select a base', + required: true, + }, + { + id: 'baseId', + title: 'Base ID', + type: 'short-input', + canonicalParamId: 'baseId', + mode: 'advanced', + placeholder: 'e.g. appXXXXXXXXXXXXXX', + required: true, + }, + // Table: selector depends on base (basic) + manual (advanced) + { + id: 'tableSelector', + title: 'Table', + type: 'selector', + selectorKey: 'airtable.tables', + canonicalParamId: 'tableIdOrName', + mode: 'basic', + dependsOn: ['baseSelector'], // References the selector field ID + placeholder: 'Select a table', + required: true, + }, + { + id: 'tableIdOrName', + title: 'Table Name or ID', + type: 'short-input', + canonicalParamId: 'tableIdOrName', + mode: 'advanced', + placeholder: 'e.g. Tasks', + required: true, + }, + // Non-selector fields stay as-is + { id: 'maxRecords', title: 'Max Records', type: 'short-input', ... }, +] +``` + +### Selector with domain dependency (Jira/Confluence pattern) + +When a selector depends on a plain `short-input` field (no canonical pair), `dependsOn` references that field's `id` directly. The `domain` field's value maps to `SelectorContext.domain` automatically via `SELECTOR_CONTEXT_FIELDS`. + +```typescript +configFields: [ + { + id: 'domain', + title: 'Jira Domain', + type: 'short-input', + placeholder: 'yoursite.atlassian.net', + required: true, + }, + { + id: 'projectSelector', + title: 'Project', + type: 'selector', + selectorKey: 'jira.projects', + canonicalParamId: 'projectKey', + mode: 'basic', + dependsOn: ['domain'], + placeholder: 'Select a project', + required: true, + }, + { + id: 'projectKey', + title: 'Project Key', + type: 'short-input', + canonicalParamId: 'projectKey', + mode: 'advanced', + placeholder: 'e.g. ENG, PROJ', + required: true, + }, +] +``` + +### How `dependsOn` maps to `SelectorContext` + +The connector selector field builds a `SelectorContext` from dependency values. For the mapping to work, each dependency's `canonicalParamId` (or field `id` for non-canonical fields) must exist in `SELECTOR_CONTEXT_FIELDS` (`lib/workflows/subblocks/context.ts`): + +``` +oauthCredential, domain, teamId, projectId, knowledgeBaseId, planId, +siteId, collectionId, spreadsheetId, fileId, baseId, datasetId, serviceDeskId +``` + +### Available selector keys + +Check `hooks/selectors/types.ts` for the full `SelectorKey` union. Common ones for connectors: + +| SelectorKey | Context Deps | Returns | +|-------------|-------------|---------| +| `airtable.bases` | credential | Base ID + name | +| `airtable.tables` | credential, `baseId` | Table ID + name | +| `slack.channels` | credential | Channel ID + name | +| `gmail.labels` | credential | Label ID + name | +| `google.calendar` | credential | Calendar ID + name | +| `linear.teams` | credential | Team ID + name | +| `linear.projects` | credential, `teamId` | Project ID + name | +| `jira.projects` | credential, `domain` | Project key + name | +| `confluence.spaces` | credential, `domain` | Space key + name | +| `notion.databases` | credential | Database ID + name | +| `asana.workspaces` | credential | Workspace GID + name | +| `microsoft.teams` | credential | Team ID + name | +| `microsoft.channels` | credential, `teamId` | Channel ID + name | +| `webflow.sites` | credential | Site ID + name | +| `outlook.folders` | credential | Folder ID + name | + +## ExternalDocument Shape + +Every document returned from `listDocuments`/`getDocument` must include: + +```typescript +{ + externalId: string // Source-specific unique ID + title: string // Document title + content: string // Extracted plain text + mimeType: 'text/plain' // Always text/plain (content is extracted) + contentHash: string // SHA-256 of content (change detection) + sourceUrl?: string // Link back to original (stored on document record) + metadata?: Record // Source-specific data (fed to mapTags) +} +``` + +## Content Hashing (Required) + +The sync engine uses content hashes for change detection: + +```typescript +async function computeContentHash(content: string): Promise { + const data = new TextEncoder().encode(content) + const hashBuffer = await crypto.subtle.digest('SHA-256', data) + return Array.from(new Uint8Array(hashBuffer)).map(b => b.toString(16).padStart(2, '0')).join('') +} +``` + +## tagDefinitions — Declared Tag Definitions + +Declare which tags the connector populates using semantic IDs. Shown in the add-connector modal as opt-out checkboxes. +On connector creation, slots are **dynamically assigned** via `getNextAvailableSlot` — connectors never hardcode slot names. + +```typescript +tagDefinitions: [ + { id: 'labels', displayName: 'Labels', fieldType: 'text' }, + { id: 'version', displayName: 'Version', fieldType: 'number' }, + { id: 'lastModified', displayName: 'Last Modified', fieldType: 'date' }, +], +``` + +Each entry has: +- `id`: Semantic key matching a key returned by `mapTags` (e.g. `'labels'`, `'version'`) +- `displayName`: Human-readable name shown in the UI (e.g. "Labels", "Last Modified") +- `fieldType`: `'text'` | `'number'` | `'date'` | `'boolean'` — determines which slot pool to draw from + +Users can opt out of specific tags in the modal. Disabled IDs are stored in `sourceConfig.disabledTagIds`. +The assigned mapping (`semantic id → slot`) is stored in `sourceConfig.tagSlotMapping`. + +## mapTags — Metadata to Semantic Keys + +Maps source metadata to semantic tag keys. Required if `tagDefinitions` is set. +The sync engine calls this automatically and translates semantic keys to actual DB slots +using the `tagSlotMapping` stored on the connector. + +Return keys must match the `id` values declared in `tagDefinitions`. + +```typescript +mapTags: (metadata: Record): Record => { + const result: Record = {} + + // Validate arrays before casting — metadata may be malformed + const labels = Array.isArray(metadata.labels) ? (metadata.labels as string[]) : [] + if (labels.length > 0) result.labels = labels.join(', ') + + // Validate numbers — guard against NaN + if (metadata.version != null) { + const num = Number(metadata.version) + if (!Number.isNaN(num)) result.version = num + } + + // Validate dates — guard against Invalid Date + if (typeof metadata.lastModified === 'string') { + const date = new Date(metadata.lastModified) + if (!Number.isNaN(date.getTime())) result.lastModified = date + } + + return result +} +``` + +## External API Calls — Use `fetchWithRetry` + +All external API calls must use `fetchWithRetry` from `@/lib/knowledge/documents/utils` instead of raw `fetch()`. This provides exponential backoff with retries on 429/502/503/504 errors. It returns a standard `Response` — all `.ok`, `.json()`, `.text()` checks work unchanged. + +For `validateConfig` (user-facing, called on save), pass `VALIDATE_RETRY_OPTIONS` to cap wait time at ~7s. Background operations (`listDocuments`, `getDocument`) use the built-in defaults (5 retries, ~31s max). + +```typescript +import { VALIDATE_RETRY_OPTIONS, fetchWithRetry } from '@/lib/knowledge/documents/utils' + +// Background sync — use defaults +const response = await fetchWithRetry(url, { + method: 'GET', + headers: { Authorization: `Bearer ${accessToken}` }, +}) + +// validateConfig — tighter retry budget +const response = await fetchWithRetry(url, { ... }, VALIDATE_RETRY_OPTIONS) +``` + +## sourceUrl + +If `ExternalDocument.sourceUrl` is set, the sync engine stores it on the document record. Always construct the full URL (not a relative path). + +## Sync Engine Behavior (Do Not Modify) + +The sync engine (`lib/knowledge/connectors/sync-engine.ts`) is connector-agnostic. It: +1. Calls `listDocuments` with pagination until `hasMore` is false +2. Compares `contentHash` to detect new/changed/unchanged documents +3. Stores `sourceUrl` and calls `mapTags` on insert/update automatically +4. Handles soft-delete of removed documents +5. Resolves access tokens automatically — OAuth tokens are refreshed, API keys are decrypted from the `encryptedApiKey` column + +You never need to modify the sync engine when adding a connector. + +## Icon + +The `icon` field on `ConnectorConfig` is used throughout the UI — in the connector list, the add-connector modal, and as the document icon in the knowledge base table (replacing the generic file type icon for connector-sourced documents). The icon is read from `CONNECTOR_REGISTRY[connectorType].icon` at runtime — no separate icon map to maintain. + +If the service already has an icon in `apps/sim/components/icons.tsx` (from a tool integration), reuse it. Otherwise, ask the user to provide the SVG. + +## Registering + +Add one line to `apps/sim/connectors/registry.ts`: + +```typescript +import { {service}Connector } from '@/connectors/{service}' + +export const CONNECTOR_REGISTRY: ConnectorRegistry = { + // ... existing connectors ... + {service}: {service}Connector, +} +``` + +## Reference Implementations + +- **OAuth**: `apps/sim/connectors/confluence/confluence.ts` — multiple config field types, `mapTags`, label fetching +- **API key**: `apps/sim/connectors/fireflies/fireflies.ts` — GraphQL API with Bearer token auth + +## Checklist + +- [ ] Created `connectors/{service}/{service}.ts` with full ConnectorConfig +- [ ] Created `connectors/{service}/index.ts` barrel export +- [ ] **Auth configured correctly:** + - OAuth: `auth.provider` matches an existing `OAuthService` in `lib/oauth/types.ts` + - API key: `auth.label` and `auth.placeholder` set appropriately +- [ ] **Selector fields configured correctly (if applicable):** + - Every `type: 'selector'` field has a canonical pair (`short-input` or `dropdown` with same `canonicalParamId` and `mode: 'advanced'`) + - `required` is identical on both fields in each canonical pair + - `selectorKey` exists in `hooks/selectors/registry.ts` + - `dependsOn` references selector field IDs (not `canonicalParamId`) + - Dependency `canonicalParamId` values exist in `SELECTOR_CONTEXT_FIELDS` +- [ ] `listDocuments` handles pagination and computes content hashes +- [ ] `sourceUrl` set on each ExternalDocument (full URL, not relative) +- [ ] `metadata` includes source-specific data for tag mapping +- [ ] `tagDefinitions` declared for each semantic key returned by `mapTags` +- [ ] `mapTags` implemented if source has useful metadata (labels, dates, versions) +- [ ] `validateConfig` verifies the source is accessible +- [ ] All external API calls use `fetchWithRetry` (not raw `fetch`) +- [ ] All optional config fields validated in `validateConfig` +- [ ] Icon exists in `components/icons.tsx` (or asked user to provide SVG) +- [ ] Registered in `connectors/registry.ts` diff --git a/.claude/commands/validate-connector.md b/.claude/commands/validate-connector.md new file mode 100644 index 00000000000..adcbf61b12b --- /dev/null +++ b/.claude/commands/validate-connector.md @@ -0,0 +1,316 @@ +--- +description: Validate an existing knowledge base connector against its service's API docs +argument-hint: [api-docs-url] +--- + +# Validate Connector Skill + +You are an expert auditor for Sim knowledge base connectors. Your job is to thoroughly validate that an existing connector is correct, complete, and follows all conventions. + +## Your Task + +When the user asks you to validate a connector: +1. Read the service's API documentation (via Context7 or WebFetch) +2. Read the connector implementation, OAuth config, and registry entries +3. Cross-reference everything against the API docs and Sim conventions +4. Report all issues found, grouped by severity (critical, warning, suggestion) +5. Fix all issues after reporting them + +## Step 1: Gather All Files + +Read **every** file for the connector — do not skip any: + +``` +apps/sim/connectors/{service}/{service}.ts # Connector implementation +apps/sim/connectors/{service}/index.ts # Barrel export +apps/sim/connectors/registry.ts # Connector registry entry +apps/sim/connectors/types.ts # ConnectorConfig interface, ExternalDocument, etc. +apps/sim/connectors/utils.ts # Shared utilities (computeContentHash, htmlToPlainText, etc.) +apps/sim/lib/oauth/oauth.ts # OAUTH_PROVIDERS — single source of truth for scopes +apps/sim/lib/oauth/utils.ts # getCanonicalScopesForProvider, getScopesForService, SCOPE_DESCRIPTIONS +apps/sim/lib/oauth/types.ts # OAuthService union type +apps/sim/components/icons.tsx # Icon definition for the service +``` + +If the connector uses selectors, also read: +``` +apps/sim/hooks/selectors/registry.ts # Selector key definitions +apps/sim/hooks/selectors/types.ts # SelectorKey union type +apps/sim/lib/workflows/subblocks/context.ts # SELECTOR_CONTEXT_FIELDS +``` + +## Step 2: Pull API Documentation + +Fetch the official API docs for the service. This is the **source of truth** for: +- Endpoint URLs, HTTP methods, and auth headers +- Required vs optional parameters +- Parameter types and allowed values +- Response shapes and field names +- Pagination patterns (cursor, offset, next token) +- Rate limits and error formats +- OAuth scopes and their meanings + +Use Context7 (resolve-library-id → query-docs) or WebFetch to retrieve documentation. If both fail, note which claims are based on training knowledge vs verified docs. + +## Step 3: Validate API Endpoints + +For **every** API call in the connector (`listDocuments`, `getDocument`, `validateConfig`, and any helper functions), verify against the API docs: + +### URLs and Methods +- [ ] Base URL is correct for the service's API version +- [ ] Endpoint paths match the API docs exactly +- [ ] HTTP method is correct (GET, POST, PUT, PATCH, DELETE) +- [ ] Path parameters are correctly interpolated and URI-encoded where needed +- [ ] Query parameters use correct names and formats per the API docs + +### Headers +- [ ] Authorization header uses the correct format: + - OAuth: `Authorization: Bearer ${accessToken}` + - API Key: correct header name per the service's docs +- [ ] `Content-Type` is set for POST/PUT/PATCH requests +- [ ] Any service-specific headers are present (e.g., `Notion-Version`, `Dropbox-API-Arg`) +- [ ] No headers are sent that the API doesn't support or silently ignores + +### Request Bodies +- [ ] POST/PUT body fields match API parameter names exactly +- [ ] Required fields are always sent +- [ ] Optional fields are conditionally included (not sent as `null` or empty unless the API expects that) +- [ ] Field value types match API expectations (string vs number vs boolean) + +### Input Sanitization +- [ ] User-controlled values interpolated into query strings are properly escaped: + - OData `$filter`: single quotes escaped with `''` (e.g., `externalId.replace(/'/g, "''")`) + - SOQL: single quotes escaped with `\'` + - GraphQL variables: passed as variables, not interpolated into query strings + - URL path segments: `encodeURIComponent()` applied +- [ ] URL-type config fields (e.g., `siteUrl`, `instanceUrl`) are normalized: + - Strip `https://` / `http://` prefix if the API expects bare domains + - Strip trailing `/` + - Apply `.trim()` before validation + +### Response Parsing +- [ ] Response structure is correctly traversed (e.g., `data.results` vs `data.items` vs `data`) +- [ ] Field names extracted match what the API actually returns +- [ ] Nullable fields are handled with `?? null` or `|| undefined` +- [ ] Error responses are checked before accessing data fields + +## Step 4: Validate OAuth Scopes (if OAuth connector) + +Scopes must be correctly declared and sufficient for all API calls the connector makes. + +### Connector requiredScopes +- [ ] `requiredScopes` in the connector's `auth` config lists all scopes needed by the connector +- [ ] Each scope in `requiredScopes` is a real, valid scope recognized by the service's API +- [ ] No invalid, deprecated, or made-up scopes are listed +- [ ] No unnecessary excess scopes beyond what the connector actually needs + +### Scope Subset Validation (CRITICAL) +- [ ] Every scope in `requiredScopes` exists in the OAuth provider's `scopes` array in `lib/oauth/oauth.ts` +- [ ] Find the provider in `OAUTH_PROVIDERS[providerGroup].services[serviceId].scopes` +- [ ] Verify: `requiredScopes` ⊆ `OAUTH_PROVIDERS scopes` (every required scope is present in the provider config) +- [ ] If a required scope is NOT in the provider config, flag as **critical** — the connector will fail at runtime + +### Scope Sufficiency +For each API endpoint the connector calls: +- [ ] Identify which scopes are required per the API docs +- [ ] Verify those scopes are included in the connector's `requiredScopes` +- [ ] If the connector calls endpoints requiring scopes not in `requiredScopes`, flag as **warning** + +### Token Refresh Config +- [ ] Check the `getOAuthTokenRefreshConfig` function in `lib/oauth/oauth.ts` for this provider +- [ ] `useBasicAuth` matches the service's token exchange requirements +- [ ] `supportsRefreshTokenRotation` matches whether the service issues rotating refresh tokens +- [ ] Token endpoint URL is correct + +## Step 5: Validate Pagination + +### listDocuments Pagination +- [ ] Cursor/pagination parameter name matches the API docs +- [ ] Response pagination field is correctly extracted (e.g., `next_cursor`, `nextPageToken`, `@odata.nextLink`, `offset`) +- [ ] `hasMore` is correctly determined from the response +- [ ] `nextCursor` is correctly passed back for the next page +- [ ] `maxItems` / `maxRecords` cap is correctly applied across pages using `syncContext.totalDocsFetched` +- [ ] Page size is within the API's allowed range (not exceeding max page size) +- [ ] Last page precision: when a `maxItems` cap exists, the final page request uses `Math.min(PAGE_SIZE, remaining)` to avoid fetching more records than needed +- [ ] No off-by-one errors in pagination tracking +- [ ] The connector does NOT hit known API pagination limits silently (e.g., HubSpot search 10k cap) + +### Pagination State Across Pages +- [ ] `syncContext` is used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.) +- [ ] Cached state in `syncContext` is correctly initialized on first page and reused on subsequent pages + +## Step 6: Validate Data Transformation + +### ExternalDocument Construction +- [ ] `externalId` is a stable, unique identifier from the source API +- [ ] `title` is extracted from the correct field and has a sensible fallback (e.g., `'Untitled'`) +- [ ] `content` is plain text — HTML content is stripped using `htmlToPlainText` from `@/connectors/utils` +- [ ] `mimeType` is `'text/plain'` +- [ ] `contentHash` is computed using `computeContentHash` from `@/connectors/utils` +- [ ] `sourceUrl` is a valid, complete URL back to the original resource (not relative) +- [ ] `metadata` contains all fields referenced by `mapTags` and `tagDefinitions` + +### Content Extraction +- [ ] Rich text / HTML fields are converted to plain text before indexing +- [ ] Important content is not silently dropped (e.g., nested blocks, table cells, code blocks) +- [ ] Content is not silently truncated without logging a warning +- [ ] Empty/blank documents are properly filtered out +- [ ] Size checks use `Buffer.byteLength(text, 'utf8')` not `text.length` when comparing against byte-based limits (e.g., `MAX_FILE_SIZE` in bytes) + +## Step 7: Validate Tag Definitions and mapTags + +### tagDefinitions +- [ ] Each `tagDefinition` has an `id`, `displayName`, and `fieldType` +- [ ] `fieldType` matches the actual data type: `'text'` for strings, `'number'` for numbers, `'date'` for dates, `'boolean'` for booleans +- [ ] Every `id` in `tagDefinitions` is returned by `mapTags` +- [ ] No `tagDefinition` references a field that `mapTags` never produces + +### mapTags +- [ ] Return keys match `tagDefinition` `id` values exactly +- [ ] Date values are properly parsed using `parseTagDate` from `@/connectors/utils` +- [ ] Array values are properly joined using `joinTagArray` from `@/connectors/utils` +- [ ] Number values are validated (not `NaN`) +- [ ] Metadata field names accessed in `mapTags` match what `listDocuments`/`getDocument` store in `metadata` + +## Step 8: Validate Config Fields and Validation + +### configFields +- [ ] Every field has `id`, `title`, `type` +- [ ] `required` is set explicitly (not omitted) +- [ ] Dropdown fields have `options` with `label` and `id` for each option +- [ ] Selector fields follow the canonical pair pattern: + - A `type: 'selector'` field with `selectorKey`, `canonicalParamId`, `mode: 'basic'` + - A `type: 'short-input'` field with the same `canonicalParamId`, `mode: 'advanced'` + - `required` is identical on both fields in the pair +- [ ] `selectorKey` values exist in the selector registry +- [ ] `dependsOn` references selector field `id` values, not `canonicalParamId` + +### validateConfig +- [ ] Validates all required fields are present before making API calls +- [ ] Validates optional numeric fields (checks `Number.isNaN`, positive values) +- [ ] Makes a lightweight API call to verify access (e.g., fetch 1 record, get profile) +- [ ] Uses `VALIDATE_RETRY_OPTIONS` for retry budget +- [ ] Returns `{ valid: true }` on success +- [ ] Returns `{ valid: false, error: 'descriptive message' }` on failure +- [ ] Catches exceptions and returns user-friendly error messages +- [ ] Does NOT make expensive calls (full data listing, large queries) + +## Step 9: Validate getDocument + +- [ ] Fetches a single document by `externalId` +- [ ] Returns `null` for 404 / not found (does not throw) +- [ ] Returns the same `ExternalDocument` shape as `listDocuments` +- [ ] Handles all content types that `listDocuments` can produce (e.g., if `listDocuments` returns both pages and blogposts, `getDocument` must handle both — not hardcode one endpoint) +- [ ] Forwards `syncContext` if it needs cached state (user names, field maps, etc.) +- [ ] Error handling is graceful (catches, logs, returns null or throws with context) +- [ ] Does not redundantly re-fetch data already included in the initial API response (e.g., if comments come back with the post, don't fetch them again separately) + +## Step 10: Validate General Quality + +### fetchWithRetry Usage +- [ ] All external API calls use `fetchWithRetry` from `@/lib/knowledge/documents/utils` +- [ ] No raw `fetch()` calls to external APIs +- [ ] `VALIDATE_RETRY_OPTIONS` used in `validateConfig` +- [ ] If `validateConfig` calls a shared helper (e.g., `linearGraphQL`, `resolveId`), that helper must accept and forward `retryOptions` to `fetchWithRetry` +- [ ] Default retry options used in `listDocuments`/`getDocument` + +### API Efficiency +- [ ] APIs that support field selection (e.g., `$select`, `sysparm_fields`, `fields`) should request only the fields the connector needs — in both `listDocuments` AND `getDocument` +- [ ] No redundant API calls: if a helper already fetches data (e.g., site metadata), callers should reuse the result instead of making a second call for the same information +- [ ] Sequential per-item API calls (fetching details for each document in a loop) should be batched with `Promise.all` and a concurrency limit of 3-5 + +### Error Handling +- [ ] Individual document failures are caught and logged without aborting the sync +- [ ] API error responses include status codes in error messages +- [ ] No unhandled promise rejections in concurrent operations + +### Concurrency +- [ ] Concurrent API calls use reasonable batch sizes (3-5 is typical) +- [ ] No unbounded `Promise.all` over large arrays + +### Logging +- [ ] Uses `createLogger` from `@sim/logger` (not `console.log`) +- [ ] Logs sync progress at `info` level +- [ ] Logs errors at `warn` or `error` level with context + +### Registry +- [ ] Connector is exported from `connectors/{service}/index.ts` +- [ ] Connector is registered in `connectors/registry.ts` +- [ ] Registry key matches the connector's `id` field + +## Step 11: Report and Fix + +### Report Format + +Group findings by severity: + +**Critical** (will cause runtime errors, data loss, or auth failures): +- Wrong API endpoint URL or HTTP method +- Invalid or missing OAuth scopes (not in provider config) +- Incorrect response field mapping (accessing wrong path) +- SOQL/query fields that don't exist on the target object +- Pagination that silently hits undocumented API limits +- Missing error handling that would crash the sync +- `requiredScopes` not a subset of OAuth provider scopes +- Query/filter injection: user-controlled values interpolated into OData `$filter`, SOQL, or query strings without escaping + +**Warning** (incorrect behavior, data quality issues, or convention violations): +- HTML content not stripped via `htmlToPlainText` +- `getDocument` not forwarding `syncContext` +- `getDocument` hardcoded to one content type when `listDocuments` returns multiple (e.g., only pages but not blogposts) +- Missing `tagDefinition` for metadata fields returned by `mapTags` +- Incorrect `useBasicAuth` or `supportsRefreshTokenRotation` in token refresh config +- Invalid scope names that the API doesn't recognize (even if silently ignored) +- Private resources excluded from name-based lookup despite scopes being available +- Silent data truncation without logging +- Size checks using `text.length` (character count) instead of `Buffer.byteLength` (byte count) for byte-based limits +- URL-type config fields not normalized (protocol prefix, trailing slashes cause API failures) +- `VALIDATE_RETRY_OPTIONS` not threaded through helper functions called by `validateConfig` + +**Suggestion** (minor improvements): +- Missing incremental sync support despite API supporting it +- Overly broad scopes that could be narrowed (not wrong, but could be tighter) +- Source URL format could be more specific +- Missing `orderBy` for deterministic pagination +- Redundant API calls that could be cached in `syncContext` +- Sequential per-item API calls that could be batched with `Promise.all` (concurrency 3-5) +- API supports field selection but connector fetches all fields (e.g., missing `$select`, `sysparm_fields`, `fields`) +- `getDocument` re-fetches data already included in the initial API response (e.g., comments returned with post) +- Last page of pagination requests full `PAGE_SIZE` when fewer records remain (`Math.min(PAGE_SIZE, remaining)`) + +### Fix All Issues + +After reporting, fix every **critical** and **warning** issue. Apply **suggestions** where they don't add unnecessary complexity. + +### Validation Output + +After fixing, confirm: +1. `bun run lint` passes +2. TypeScript compiles clean +3. Re-read all modified files to verify fixes are correct + +## Checklist Summary + +- [ ] Read connector implementation, types, utils, registry, and OAuth config +- [ ] Pulled and read official API documentation for the service +- [ ] Validated every API endpoint URL, method, headers, and body against API docs +- [ ] Validated input sanitization: no query/filter injection, URL fields normalized +- [ ] Validated OAuth scopes: `requiredScopes` ⊆ OAuth provider `scopes` in `oauth.ts` +- [ ] Validated each scope is real and recognized by the service's API +- [ ] Validated scopes are sufficient for all API endpoints the connector calls +- [ ] Validated token refresh config (`useBasicAuth`, `supportsRefreshTokenRotation`) +- [ ] Validated pagination: cursor names, page sizes, hasMore logic, no silent caps +- [ ] Validated data transformation: plain text extraction, HTML stripping, content hashing +- [ ] Validated tag definitions match mapTags output, correct fieldTypes +- [ ] Validated config fields: canonical pairs, selector keys, required flags +- [ ] Validated validateConfig: lightweight check, error messages, retry options +- [ ] Validated getDocument: null on 404, all content types handled, no redundant re-fetches, syncContext forwarding +- [ ] Validated fetchWithRetry used for all external calls (no raw fetch), VALIDATE_RETRY_OPTIONS threaded through helpers +- [ ] Validated API efficiency: field selection used, no redundant calls, sequential fetches batched +- [ ] Validated error handling: graceful failures, no unhandled rejections +- [ ] Validated logging: createLogger, no console.log +- [ ] Validated registry: correct export, correct key +- [ ] Reported all issues grouped by severity +- [ ] Fixed all critical and warning issues +- [ ] Ran `bun run lint` after fixes +- [ ] Verified TypeScript compiles clean diff --git a/.cursor/rules/landing-seo-geo.mdc b/.cursor/rules/landing-seo-geo.mdc new file mode 100644 index 00000000000..aa1503eab1d --- /dev/null +++ b/.cursor/rules/landing-seo-geo.mdc @@ -0,0 +1,26 @@ +--- +description: SEO and GEO guidelines for the landing page +globs: ["apps/sim/app/(home)/**/*.tsx"] +--- + +# Landing Page — SEO / GEO + +## SEO + +- One `

` per page, in Hero only — never add another. +- Strict heading hierarchy: H1 (Hero) → H2 (section titles) → H3 (feature names). +- Every section: `
`. +- Decorative/animated elements: `aria-hidden="true"`. +- All internal routes use Next.js `` (crawlable). External links get `rel="noopener noreferrer"`. +- Navbar is a Server Component (no `'use client'`) for immediate crawlability. Logo `` has `priority` (LCP element). +- Navbar `