Conversation
4c93a19 to
2450d3d
Compare
2450d3d to
3066589
Compare
|
Hi @dongluw , thanks for the contribution, indeed there seems to be a flip issue. however how do you obtain the test image above? I'm surprised our tests haven't caught this, it should be all wrong, so a reproducer would help |
|
hey @molbap I saved the this issue only affects generation quality if images are of very high/low aspect ratio |
src/transformers/models/cohere2_vision/image_processing_cohere2_vision_fast.py
Outdated
Show resolved
Hide resolved
src/transformers/models/cohere2_vision/modular_cohere2_vision.py
Outdated
Show resolved
Hide resolved
zucchini-nlp
left a comment
There was a problem hiding this comment.
hey, nice catch! I was adapting the processing from GotOCR and misplaced the sizes for (h, w), commented it below. I think we need to fix where aspect ratios are computed
There was a problem hiding this comment.
the issue is actually here because the grids come in (w, h) format but the original size is in (h, w) format. We need to swap the original size format
There was a problem hiding this comment.
the (h, w) order of original_image_size is derived from the input https://github.com/dongluw/transformers/blob/cfef59b0d012002cea6ee16e7b68d2e9af0a4f44/src/transformers/models/cohere2_vision/image_processing_cohere2_vision_fast.py#L167-L169
IIUC it would make more sense to change the the function call above, since the input has order (original_height, original_width) while the output expects num_columns, num_rows, which is flipped
If you can point me to where this part is generated from, I can try to make a change there instead
There was a problem hiding this comment.
seems the function call is generated from GotOcr2ImageProcessorFast
https://github.com/dongluw/transformers/blob/cfef59b0d012002cea6ee16e7b68d2e9af0a4f44/src/transformers/models/got_ocr2/image_processing_got_ocr2_fast.py#L93-L95
however modifying this class would probably affect other models, so I think we can keep the (h, w) order in this pr.
btw the grids is list of symmetric tuples that doesn't assume specific dim order, the fix of flipping dim order at the return statement still needs to be there IMO
There was a problem hiding this comment.
yeah, the grids do not assume specific order, The issue is that we need to choose one layout and follow it for consistency. And since the naming suggests that (w, h) layout is used in grids, I prefer to keep it consistent. Currently grids assume (w, h) and the number of columns also assume that layout is (w, h). The issue is in original size not following the same format and thus messing with aspect ratios
So for original_size = np.stack([image_width, image_height]) looks an easier approach to me, instead of having to rename more variables for general consistency
There was a problem hiding this comment.
okay improved the naming
| def test_crop_to_patches_aspect_ratio(self): | ||
| """Test that row/column ordering is correct when cropping non-square images to patches. | ||
|
|
||
| This test verifies that patches can be stitched back to reconstruct the original image, | ||
| which validates that the row/column ordering in get_optimal_tiled_canvas is correct. | ||
| If row/column are swapped, the image would be resized to wrong dimensions and patches | ||
| would not match the original content. | ||
| """ |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: cohere2_vision |
zucchini-nlp
left a comment
There was a problem hiding this comment.
LGTM, thanks for iterating
| # tiles following (width, height) order to align with aspect ratio convention | ||
| tile_size = np.stack([image_width, image_height]) | ||
| required_scales = candidate_resolutions / tile_size |
There was a problem hiding this comment.
great, thanks for explicitly commenting out
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* Add test case and update image processing * Apply suggestions from code review * improve naming

What does this PR do?
before fix:
after fix:

Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.