Shortly after publishing my last post, I started thinking about the similarity between the scene-to-screen transforms that I showed there, and the shape of the “toe” of filmic tonemapping operators, and I realized that they’re both doing the same thing: compensating for the dim or dark surround of the viewing environment. A filmic tonemapping operator is simulating the response of film that will then be projected in a dark theatre, so it has the “scene-to-screen” transform already baked in.

Looking at the source code of the ACES Rec. 709 ODT, for example, it clearly mentions that it’s intended for a display with a dim surround, indicating that it’s already taken the scene-to-screen transform into account. This is further confirmed by the fact that the output is encoded with the inverse BT.1886 EOTF.

The "toe" (the region from 0 to about 0.15) of the ACES filmic tonemapping curve (here shown without input pre-scaling) is qualitatively similar to the scene-to-screen transform formed by the concatenation of the Rec. 709 OETF and BT.1886 EOTF.

The upshot is that if you want to faithfully apply a filmic tonemapping operator such as ACES, you should modify it to cancel out any scene-to-screen transform that is applied on your final signal.