Run audits concurrently using concurrent_tasks setting#5731
Draft
airhorns wants to merge 1 commit intoSQLMesh:mainfrom
Draft
Run audits concurrently using concurrent_tasks setting#5731airhorns wants to merge 1 commit intoSQLMesh:mainfrom
airhorns wants to merge 1 commit intoSQLMesh:mainfrom
Conversation
When audit_only=True, all audit tasks across all snapshots are flattened into a single thread pool instead of following DAG ordering. Since audits are read-only SELECT queries with no side effects, DAG dependencies are irrelevant and all concurrent_tasks worker slots stay filled. Per-model audit concurrency is also plumbed through SnapshotEvaluator.audit() via a new audit_concurrent_tasks parameter (defaults to sequential). The cross-model path hardcodes this to 1 to avoid nested thread pool multiplication. The SnapshotEvaluator parameter ddl_concurrent_tasks is renamed to concurrent_tasks to reflect its broader scope. Closes SQLMesh#5468 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
This is a draft PR -- looking for feedback on the general approach and the core implementation -- I am a new contributor so I am very happy to be told this is totally wrong! I used Claude Code to help me with this, but I promise its not a drive-by slop PR, and I'm making use of this in a fork in my own project. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Before, audits were run one at a time after each model, and block the execution of downstream models. If you are a heavy user of audits, this trades correctness for a lot of speed, where the run will only move forward if all the audits pass. But, if you want to get the run done and then audit after the fact, it really slows things down.
dbttakes a different approach, where you can do a run, and then later run tests. I wouldn't say that's better, but in some circumstances, especially with non-blocking audits or warnings, you don't need to know their state before moving on and building the next model.So, in an effort to facilitate running audits out of band as fast as possible, or faster when run in band, this changes audits to execute concurrently as part of the DAG! Paired with #5730, this lets folks run the models without any audits, and later run audits with a high degree of paralleism.
Implementation
When audit_only=True, all audit tasks across all snapshots are flattened into a single thread pool instead of following DAG ordering. Since audits are read-only SELECT queries with no side effects, DAG dependencies are irrelevant and all concurrent_tasks worker slots stay filled.
Per-model audit concurrency is also plumbed through SnapshotEvaluator.audit() via a new audit_concurrent_tasks parameter (defaults to sequential). The cross-model path hardcodes this to 1 to avoid nested thread pool multiplication.
The SnapshotEvaluator parameter ddl_concurrent_tasks is renamed to concurrent_tasks to reflect its broader scope.
Test Plan
Checklist
make styleand fixed any issuesmake fast-test)git commit -s) per the DCOCloses #5468