Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: AlphaKure/llama-cpp-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: abetlen/llama-cpp-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 19 commits
  • 34 files changed
  • 4 contributors

Commits on Mar 22, 2026

  1. fix(ci): Rename huggingface-cli to hf (abetlen#2149)

    * Fix model download in test workflow
    
    * Use hf CLI in test workflow
    
    * Use hf CLI name in CI and docs
    
    * Reference PR in changelog
    abetlen authored Mar 22, 2026
    Configuration menu
    Copy the full SHA
    ca3b00a View commit details
    Browse the repository at this point in the history
  2. fix(ci): Fix macos tests, support both Intel and Apple Silicon testing (

    abetlen#2150)
    
    * fix(ci): use supported macos runner label
    
    * fix(ci): add apple silicon macos test coverage
    
    * fix(ci): run standard macos tests on apple silicon
    
    * fix(ci): simplify apple silicon macos install
    
    * fix(ci): disable ggml native on apple silicon runner
    
    * docs: update changelog for macos ci runner fix
    abetlen authored Mar 22, 2026
    Configuration menu
    Copy the full SHA
    9f661ff View commit details
    Browse the repository at this point in the history
  3. misc: Add Ruff formatting (abetlen#2148)

    * Add Ruff formatting and safe lint baseline
    
    * Update changelog for Ruff setup
    abetlen authored Mar 22, 2026
    Configuration menu
    Copy the full SHA
    a9b4a06 View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2026

  1. feat: Update llama.cpp to ggml-org/llama.cpp@49bfdde (abetlen#2151)

    * Update llama.cpp and sync bindings
    
    * Clean up binding compatibility shims
    
    * Remove flash attention property shim
    
    * Remove mtmd verbosity shim
    
    * Add docstrings for new bindings
    
    * Format Ruff files and add changelog entry
    abetlen authored Mar 23, 2026
    Configuration menu
    Copy the full SHA
    18aa31e View commit details
    Browse the repository at this point in the history
  2. ci: add riscv64 wheel builds to release workflow (abetlen#2139)

    * ci: add riscv64 wheel builds to release workflow
    
    Add a build_wheels_riscv64 job mirroring the existing arm64 QEMU-based
    build. Uses cibuildwheel with QEMU emulation for linux/riscv64, targeting
    CPython 3.10-3.14 on manylinux.
    
    Closes abetlen#2138
    
    * ci: use cibuildwheel 3.1.2 for riscv64 wheels
    
    * docs: update changelog for riscv64 wheel PR
    
    ---------
    
    Co-authored-by: abetlen <abetlen@gmail.com>
    gounthar and abetlen authored Mar 23, 2026
    Configuration menu
    Copy the full SHA
    e1f8ac0 View commit details
    Browse the repository at this point in the history
  3. fix: Qwen 3.5 support (abetlen#2152)

    * fix: handle Qwen 3.5 hybrid prefix reuse
    
    * test: fix Qwen runtime unit mocks
    
    * test: drop Qwen runtime unit tests
    
    * docs: credit Qwen fix contributors in changelog
    
    * docs/tests: update default Qwen model to 3.5 0.8B
    
    * test: rebaseline Qwen 3.5 outputs
    
    * test: stabilize low-level Qwen sampling check
    
    * test: tighten Qwen 3.5 completion prompts
    abetlen authored Mar 23, 2026
    Configuration menu
    Copy the full SHA
    11e7a55 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a6b1807 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2026

  1. fix(ci): release wheel workflow (abetlen#2154)

    * fix(ci): harden release wheel workflow
    
    * fix(ci): document and pin release wheel baselines
    
    * fix(ci): speed up release arch builds
    
    * fix(ci): split riscv64 by python version
    
    * fix(ci): sanitize riscv64 artifact names
    abetlen authored Mar 24, 2026
    Configuration menu
    Copy the full SHA
    f0391c5 View commit details
    Browse the repository at this point in the history
  2. fix(ci): cuda wheel workflow (abetlen#2155)

    * fix(ci): harden cuda wheel workflow
    
    * fix(ci): pin cuda toolkit versions accurately
    
    * fix(ci): resolve exact cuda toolkit installs
    
    * fix(ci): align cuda toolkit roots and tags
    
    * fix(ci): pin cuda packages to nvidia label
    
    * fix(ci): allow cuda solver to mix non-cuda deps
    abetlen authored Mar 24, 2026
    Configuration menu
    Copy the full SHA
    909ebf1 View commit details
    Browse the repository at this point in the history
  3. fix(ci): docker build workflow (abetlen#2156)

    * fix(ci): harden docker build workflow
    
    * docs: update changelog for ci workflows
    abetlen authored Mar 24, 2026
    Configuration menu
    Copy the full SHA
    ccc6bc0 View commit details
    Browse the repository at this point in the history
  4. feat: expose attention_type parameter in Llama.__init__ (abetlen#2143)

    * feat: expose attention_type parameter in Llama.__init__
    
    * docs: preserve attention_type in pickled state
    
    * docs: update changelog for attention_type
    
    ---------
    
    Co-authored-by: Victor Biederbeck <victor@moria.hiddencove.xyz>
    Co-authored-by: abetlen <abetlen@gmail.com>
    3 people authored Mar 24, 2026
    Configuration menu
    Copy the full SHA
    7b38c31 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    d6f46a5 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2026

  1. fix(ci): reduce CUDA binary wheel size only including cubins for curr…

    …ent arches and one PTX target for forward compatibility (abetlen#2158)
    
    * fix(ci): shrink CUDA wheel fatbins
    
    * docs: update changelog for cuda wheel size fix
    abetlen authored Mar 25, 2026
    Configuration menu
    Copy the full SHA
    5f9c231 View commit details
    Browse the repository at this point in the history
  2. fix: handle embedding models without KV memory (abetlen#2160)

    * Fix embedding models without KV memory
    
    * Add changelog entry for embedding memory fix
    abetlen authored Mar 25, 2026
    Configuration menu
    Copy the full SHA
    ac59e5a View commit details
    Browse the repository at this point in the history
  3. feat: Update llama.cpp to ggml-org/llama.cpp@c0159f9 (abetlen#2161)

    * Update llama.cpp to c0159f9c1
    
    * Add changelog entry for llama.cpp update
    abetlen authored Mar 25, 2026
    Configuration menu
    Copy the full SHA
    c670222 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    f54421b View commit details
    Browse the repository at this point in the history

Commits on Mar 29, 2026

  1. fix(ci): publish distinct manylinux and musllinux cpu wheels (abetlen…

    …#2165)
    
    * fix(ci): publish distinct manylinux and musllinux cpu wheels
    
    * docs: add changelog entry for linux wheel repair fix
    abetlen authored Mar 29, 2026
    Configuration menu
    Copy the full SHA
    fcd932a View commit details
    Browse the repository at this point in the history
  2. ci: publish release wheels as py3-none (abetlen#2166)

    * ci: publish CPU wheels as py3-none
    
    * docs: add changelog entry for py3-none wheel tags
    abetlen authored Mar 29, 2026
    Configuration menu
    Copy the full SHA
    7613aca View commit details
    Browse the repository at this point in the history

Commits on Mar 30, 2026

  1. Configuration menu
    Copy the full SHA
    7257ba9 View commit details
    Browse the repository at this point in the history
Loading