Skip to content

Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance#21340

Open
drealecs wants to merge 2 commits intophp:masterfrom
drealecs:improve-array-reduce-consume-carry
Open

Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance#21340
drealecs wants to merge 2 commits intophp:masterfrom
drealecs:improve-array-reduce-consume-carry

Conversation

@drealecs
Copy link

@drealecs drealecs commented Mar 4, 2026

Related to #8283

For some callback operations, it is safe to consume/reuse a value passed as a parameter instead of copying it first.
This PR adds a generic internal mechanism for that (through zend_fcall_info.consumed_args) and applies it to array_reduce() carry argument, and a few other callback-using functions.

This reduces copy-on-write churn, having the parameter value the only reference, and significantly improves array_reduce() performance for mutable accumulator patterns, while keeping userland semantics unchanged.

The same mechanism can be used in other callback-using functions, and it is also applied here to preg_replace_callback() / preg_replace_callback_array() and ob_start().

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch 2 times, most recently from 86f212d to 2f9aa58 Compare March 4, 2026 17:24
@drealecs drealecs changed the title Draft: Improve array_reduce() consume carry Improve array_reduce() consume carry Mar 4, 2026
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 2f9aa58 to 3299db5 Compare March 4, 2026 18:55
@TimWolla
Copy link
Member

TimWolla commented Mar 4, 2026

Can you provide the benchmark you used to verify this improves the situation?

@drealecs
Copy link
Author

drealecs commented Mar 5, 2026

Can you provide the benchmark you used to verify this improves the situation?

Yes, this is a good point.
Here is a simple test I used locally:

<?php

declare(strict_types=1);

function test($size) {
    $data = [];
    for ($i = 0; $i < $size; $i++) {
        $data[] = substr(hash('crc32b', (string) $i), 0, 4);
    }

    $start = hrtime(true);
    $result = array_reduce(
        $data,
        static function ($carry, $item) {
            @$carry[$item]++;
            return $carry;
        },
        [],
    );
    $duration = hrtime(true) - $start;

    printf("size=%d, unique=%d, duration=%.3f ms\n", $size, count($result), $duration/1_000_000);
}

test(10_000);
test(20_000);
test(40_000);

On master, this is the output:

$ ./sapi/cli/php test_array_reduce.php
size=10000, unique=9980, duration=1199.907 ms
size=20000, unique=19084, duration=4844.224 ms
size=40000, unique=34066, duration=19404.972 ms

and we can see that the complexity is O(n^2).

On this branch, this is the output:

$ ./sapi/cli/php test_array_reduce.php
size=10000, unique=9980, duration=9.108 ms
size=20000, unique=19084, duration=18.905 ms
size=40000, unique=34066, duration=35.392 ms

with O(n) complexity.

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 3299db5 to cdbe42a Compare March 5, 2026 11:53
Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice proposal! The approach makes sense to me. Applying this to all cases will require some care. This is safe for any argument that is owned. So this is fine for $initial (and the return value of $callback in subsequent iterations), but wouldn't be for the elements $array (ZEND_HASH_FOREACH_VAL(htbl, operand)) given operand has not been ADDREFd at this point.

Before merging this, it may make sense to scout the code base to see which functions this may be applied to. Some of the use cases might influence the design.

Zend/zend_API.h Outdated
* integrate APIs like call_user_func_array(). The usual restriction that
* there may not be position arguments after named arguments applies. */
HashTable *named_params;
uint32_t consume_params;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consumed would indicate this property can diverge between args. args would also be more correct, though I realize you named this after the other fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I got to the same conclusion, that args might be a better name here, but I like consistency more, so I went with params.
Also, instead of consume I was thinking some other words might fit better: move, steal, transfer.

For now I'll stick with consume_params, if nothing comes up as a better alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment wasn't fully clear. I'd suggest consumed_args, which indicates better that this is map of args to be consumed. consume_args sounds like a bool to consume all args.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed, the current name is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this should be moved before the named_params field as there is already a 4 bytes gap in the struct on 64bits

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! Great point, I moved it after param_count.
And also renamed it to consumed_args.

@iluuu1994 iluuu1994 requested a review from Girgias March 6, 2026 13:04
@drealecs
Copy link
Author

drealecs commented Mar 6, 2026

Before merging this, it may make sense to scout the code base to see which functions this may be applied to. Some of the use cases might influence the design.

I already did some analysis, and there are a few other places where we could implement it to get some improvements:

  • preg_replace_callback() / preg_replace_callback_array(): for $matches, valuable if callback would modify it.
  • ob_start(): for $buffer, valuable if callback would modify it.
  • there are other cases, but it's not usual to modify the params in the callbacks, like $key for array_find().

What do you think? And related to the design, what can we improve?

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have this split into two PRs. One that adds the consumed_args with various tests added to the zend_test extension (e.g. ref variables as that seems to be explicitly checked and I'd like to understand more) and then a follow-up PR to add support for array_reduce() and whatever other functions.

Zend/zend_API.h Outdated
* integrate APIs like call_user_func_array(). The usual restriction that
* there may not be position arguments after named arguments applies. */
HashTable *named_params;
uint32_t consume_params;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed, the current name is confusing.

Zend/zend_API.h Outdated
* integrate APIs like call_user_func_array(). The usual restriction that
* there may not be position arguments after named arguments applies. */
HashTable *named_params;
uint32_t consume_params;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this should be moved before the named_params field as there is already a 4 bytes gap in the struct on 64bits

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from cdbe42a to 008446f Compare March 7, 2026 09:43
@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 008446f to fb5617f Compare March 7, 2026 11:39
@drealecs drealecs requested a review from kocsismate as a code owner March 7, 2026 11:39
@drealecs
Copy link
Author

drealecs commented Mar 7, 2026

I'd rather have this split into two PRs. One that adds the consumed_args with various tests added to the zend_test extension (e.g. ref variables as that seems to be explicitly checked and I'd like to understand more) and then a follow-up PR to add support for array_reduce() and whatever other functions.

Thank you for the review, Gina. Great point about the tests.
I implemented them using the zend_test extension, including the reference-sensitive paths. Please let me know if this covers the cases you had in mind.

I’d prefer to keep this as a single PR if that works for you. I did create exactly two commits from the start, and I plan to keep them that way, so the separation is explicit and easier to review. They contain now:

  1. engine-only consumed_args support in zend_call_function() + tests.
  2. adoption in array_reduce() + some other callback-using functions.

If you still prefer two separate PRs, I can split them, but I hoped this structure would keep review manageable while preserving context. And if merging can be done not using squash, it will keep git history relevant, but I don't know if that's something commonly used, so let me know if I should change it or if it works for you this way. Thanks again!

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 2847abb to 9171681 Compare March 7, 2026 12:43
@drealecs drealecs requested a review from Girgias March 7, 2026 12:44
@drealecs drealecs changed the title Improve array_reduce() consume carry Add consumed_args callback optimization and apply to array_reduce() for carry to improve performance Mar 7, 2026
Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Care should be taken when more than one arg is marked as consumed, since if the same value is passed to those, refcount will be off. Is there a use-case for marking more than one arg as consumed? Otherwise I would suggest to change consumed_args to represent only one arg num.


I'm wondering if this could be implemented as an opcode optimization as well. Then we could apply #20934 to array_reduce().

if (EXPECTED(fci->consumed_args == 0)) {
ZVAL_COPY(param, arg);
} else {
if (i < 32 && (fci->consumed_args & (1u << i)) && !Z_ISREF_P(arg) && arg == &fci->params[i]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move the i < 32 && (fci->consumed_args & (1u << i)) check to a macro along with ZEND_FCI_CONSUMED_ARG()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. can you check if it looks better now?

@drealecs drealecs force-pushed the improve-array-reduce-consume-carry branch from 9171681 to 4a1e6c5 Compare March 10, 2026 15:59
@drealecs
Copy link
Author

Care should be taken when more than one arg is marked as consumed, since if the same value is passed to those, refcount will be off. Is there a use-case for marking more than one arg as consumed?

Thanks @arnaud-lb for the review and for flagging this.
My understanding is that this would require an already inconsistent refcount setup when zend_call_function() receives params (e.g. multiple arg slots pointing to the same refcounted payload without matching ownership/refcount accounting).
Since consumed_args is internal-only, this doesn’t look directly userland-triggerable. We can treat it as an internal caller contract issue: if multiple args are marked consumed, the caller should ensure parameter ownership/refcount setup is valid.

But I may be missing something in the broader call paths, so happy to adjust if you still think we should add hardening here.
Maybe you can explain it a bit more, on what case you were thinking of.

@drealecs
Copy link
Author

I'm wondering if this could be implemented as an opcode optimization as well. Then we could apply #20934 to array_reduce().

Yes, that's a nice idea as well. I think this might be possible, but I don't have too much experience.
However, I think it would be more complex than this. And a bit more risky.

@arnaud-lb
Copy link
Member

My understanding is that this would require an already inconsistent refcount setup when zend_call_function() receives params (e.g. multiple arg slots pointing to the same refcounted payload without matching ownership/refcount accounting).

Not necessarily, as argument passing increases the refcount.

So this is valid:

ZVAL_STR(&args[0], ZSTR_INIT_LITERAL("test", 0));
ZVAL_COPY_VALUE(&args[1], &args[0]);

fci.retval = return_value;
fci.param_count = 2;
fci.params = args;

zend_call_function(&fci, &fci_cache);
zval_ptr_dtor(&args[0]);

But not this:

ZVAL_STR(&args[0], ZSTR_INIT_LITERAL("test", 0));
ZVAL_COPY_VALUE(&args[1], &args[0]);

fci.retval = return_value;
fci.param_count = 2;
fci.params = args;
fci.consumed_args = ZEND_FCI_CONSUMED_ARG(0) | ZEND_FCI_CONSUMED_ARG(1);

zend_call_function(&fci, &fci_cache);

As a result, using consumed_args with more than one argument requires special care. There may be cases in which it could more hidden than this.

@drealecs
Copy link
Author

As a result, using consumed_args with more than one argument requires special care. There may be cases in which it could more hidden than this.

Thanks for the example.

Without consumed_args, aliasing the same value across arg slots can still be valid.
With multi-arg consumed_args, that same aliasing pattern can become unsafe unless ownership/refcount is accounted for accordingly.
So my understanding is: even though this is internal API usage and callers opting into consumed_args should handle refcounting correctly, we should still harden this to avoid memory-unsafe footguns. Is that correct?

I’m good with either approach:

  1. keep multi-bit masks, but make consume alias-safe somehow (consume first alias, ZVAL_COPY duplicates)
  2. restrict to a single consumed arg for now, either bit-mask or not.
  3. keep it unsafe, or use some simple detection of it and bail out

In current use cases, I only found single-argument optimization cases. But also limiting this makes it less "complete".
Maybe also using a single arg slot for now, and allowing more in the future if needed. And still keeping a bit-mask but allow only 1-bit set.
I'm thinking that detecting duplicate aliases might incur some overhead.

What do you prefer? I'm good either way.

@arnaud-lb
Copy link
Member

My preference would be to restrict to a single arg for now, especially if there are no use-cases for multiple ones in the code base. Approach 1 would be ok too, but I suspect that duplicate detection would incur a performance penalty. Either way I have no strong objection.

@iluuu1994
Copy link
Member

I'd say let's restrict this to a single arg then until we have more use-cases. It's easy enough to extend otherwise.

@TimWolla
Copy link
Member

Then we could apply #20934 to array_reduce().

I had planned to apply a desugaring for the other array_*() functions once PFA has landed. The array_map() was intended as a proof of concept.

Comment on lines +909 to +918
if (EXPECTED(fci->consumed_args == 0)) {
ZVAL_COPY(param, arg);
} else {
if (ZEND_FCI_IS_CONSUMED_ARG(fci->consumed_args, i) && !Z_ISREF_P(arg) && arg == &fci->params[i]) {
ZVAL_COPY_VALUE(param, arg);
ZVAL_UNDEF(arg);
} else {
ZVAL_COPY(param, arg);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is having the extra branch valuable, or can this just be:

Suggested change
if (EXPECTED(fci->consumed_args == 0)) {
ZVAL_COPY(param, arg);
} else {
if (ZEND_FCI_IS_CONSUMED_ARG(fci->consumed_args, i) && !Z_ISREF_P(arg) && arg == &fci->params[i]) {
ZVAL_COPY_VALUE(param, arg);
ZVAL_UNDEF(arg);
} else {
ZVAL_COPY(param, arg);
}
}
if (EXPECTED(!(ZEND_FCI_IS_CONSUMED_ARG(fci->consumed_args, i) && Z_ISREF_P(arg) && arg == &fci->params[i]))) {
ZVAL_COPY(param, arg);
} else {
ZVAL_COPY_VALUE(param, arg);
ZVAL_UNDEF(arg);
}

Using the internal negation to avoid an UNEXPECTED here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants