Skip to content

fix failure of llava/pixtral#42985

Merged
ArthurZucker merged 6 commits intohuggingface:mainfrom
sywangyi:pixtral_fix
Jan 8, 2026
Merged

fix failure of llava/pixtral#42985
ArthurZucker merged 6 commits intohuggingface:mainfrom
sywangyi:pixtral_fix

Conversation

@sywangyi
Copy link
Contributor

What does this PR do?

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@sywangyi
Copy link
Contributor Author

sywangyi commented Dec 22, 2025

from transformers import  LlavaForConditionalGeneration, AutoProcessor

model_id = "mistral-community/pixtral-12b"
processor = AutoProcessor.from_pretrained(model_id)
text = "Describe the images"
inputs = processor.tokenizer(text)
print(f"Input text: '{text}'")
print(f"Token IDs: {inputs['input_ids']}")
decoded_text = processor.tokenizer.decode(inputs["input_ids"])
print(f"Decoded text: '{decoded_text}'")

output "DescribeĠtheĠimages", expected output "Describe the images"(in transformers 4.57.3)

@sywangyi
Copy link
Contributor Author

also, fix the failure in llava test

@sywangyi
Copy link
Contributor Author

sywangyi commented Dec 22, 2025

Mistral3ProcessorTest case fail because the tokenizer fix.

from transformers import AutoProcessor

model_id = "hf-internal-testing/Mistral-Small-3.1-24B-Instruct-2503-only-processor"
processor = AutoProcessor.from_pretrained(model_id)
text = "Describe the images"
inputs = processor.tokenizer(text)
print(f"Input text: '{text}'")
print(f"Token IDs: {inputs['input_ids']}")
decoded_text = processor.tokenizer.decode(inputs["input_ids"])
print(f"Decoded text: '{decoded_text}'")
before the fix
Input text: 'Describe the images'
Token IDs: [1, 5847, 13089, 1278, 8061]
Decoded text: '<s>DescribeĠtheĠimages'
after the fix
Input text: 'Describe the images'
Token IDs: [1, 5847, 1972, 22326, 1268, 8926]
Decoded text: '<s>Describetheimages'
however in 4.57.3, the output is 
Decoded text: '<s>Describe the images' which is expected, I did some investigate, seem the pre_tokenizer is incorrect for "hf-internal-testing/Mistral-Small-3.1-24B-Instruct-2503-only-processor" in 5.0.0, should be Sequence(pretokenizers=[Split(pattern=Regex("(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\n]+|\s+(?!\S)|\s+"), behavior=Isolated, invert=False), ByteLevel(add_prefix_space=False, trim_offsets=True, use_regex=False)]) instead of Metaspace(replacement="▁", prepend_scheme=always, split=False)

@sywangyi
Copy link
Contributor Author

seems the pre_tokenizer should be loaded from tokenizer.json, but in v5.0.0, it does not do it.

@molbap
Copy link
Contributor

molbap commented Dec 22, 2025

Normally #42894 should fix tokenization issues on main and on the next release candidate. Might need some time with the holiday season now, apologies.There are a few different changes on your PR, can you put what fails in the PR description and ensure your PR fixes minimally? Thanks!

@sywangyi
Copy link
Contributor Author

sywangyi commented Dec 22, 2025

from transformers import LlavaForConditionalGeneration, AutoProcessor

model_id = "mistral-community/pixtral-12b"
processor = AutoProcessor.from_pretrained(model_id)
text = "Describe the images"
inputs = processor.tokenizer(text)
print(f"Input text: '{text}'")
print(f"Token IDs: {inputs['input_ids']}")
decoded_text = processor.tokenizer.decode(inputs["input_ids"])
print(f"Decoded text: '{decoded_text}'")

Hi, I tried #42894, but it does not fix the "hf-internal-testing/Mistral-Small-3.1-24B-Instruct-2503-only-processor" and "mistral-community/pixtral-12b" issue, The issue is that nearly all cases of pytest tests/models/llava/test_modeling_llava.py::LlavaForConditionalGenerationIntegrationTest fail, because of tokenizer issue and case issue, and I fix them.

@sywangyi
Copy link
Contributor Author

@molbap I update the PR to fix all the issue I mentioned in the PR including llava test, tokenizer issue of mistral-community/pixtral-12b and hf-internal-testing/Mistral-Small-3.1-24B-Instruct-2503-only-processor

@yao-matrix
Copy link
Contributor

@molbap , could you pls help take a second look? Thx very much.

@molbap
Copy link
Contributor

molbap commented Jan 7, 2026

Hello, thanks for the investigation. However it seems that on the mistral example #42894 indeed fixes it.

from transformers import  LlavaForConditionalGeneration, AutoProcessor

model_id = "mistral-community/pixtral-12b"
processor = AutoProcessor.from_pretrained(model_id)
text = "Describe the images"
inputs = processor.tokenizer(text)
print(f"Input text: '{text}'")
print(f"Token IDs: {inputs['input_ids']}")
decoded_text = processor.tokenizer.decode(inputs["input_ids"])
print(f"Decoded text: '{decoded_text}'")

does return

Input text: 'Describe the images'
Token IDs: [5847, 13089, 1278, 8061]
Decoded text: 'Describe the images'

If the other tests are not fixed feel free to update the PR!

@itazap
Copy link
Collaborator

itazap commented Jan 7, 2026

Hey! We just merged #42894 which fixes this issue! If you have any cases that aren't resolved by this please feel free to share here :)

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: llava

@sywangyi
Copy link
Contributor Author

sywangyi commented Jan 8, 2026

Hey! We just merged #42894 which fixes this issue! If you have any cases that aren't resolved by this please feel free to share here :)

here, after updating to the latest main. the tokenizer issue I mentioned before has been resolved, but new issue pop up.

tests/models/llava/test_modeling_llava.py::LlavaForConditionalGenerationIntegrationTest::test_tokenizer_integration - AssertionError: Lists differ: ['<|im_start|>', 'sy', 'st', 'em', '\n', 'An', 'sw', 'er', ' [245 chars]'\n'] != ['<|im_start|>', 'system',...

this case failure.
see https://github.com/huggingface/transformers/blob/main/tests/models/llava/test_modeling_llava.py#L564 seems the behavior change, is it a regression?

@sywangyi
Copy link
Contributor Author

sywangyi commented Jan 8, 2026

for case fix, like dtype and device mismatch issue, fix it in case @ydshieh, please help review.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We indeed defaulted to auto dtype now!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker
Copy link
Collaborator

run-slow: llava

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

This comment contains run-slow, running the specified jobs:

models: ["models/llava"]
quantizations: []

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@ArthurZucker ArthurZucker merged commit 457048f into huggingface:main Jan 8, 2026
20 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* fix failure of llava/pixtral

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* also fix the issue of Mistral-Small-3.1-24B-Instruct-2503-only-processor tokenizer

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* update

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants