ltx-2.3-spatial-upscaler-x2-1.0 causes unwanted text/overlay

#13
by ColtonBrown1 - opened

Using the 2 stage workflow, the upscaler is causing an unwanted text overlay or "endscreen" looking garbled AI text at the end of every 20 second video. Shorter videos don't seem to be affected and I don't have hardware to try longer. I believe I have narrowed it down to the upscaler being the cause but I can't be certain, If I remove the upscaler and generate the 20 seconds in a 1 stage workflow I don't get the issue so that is my reasoning. I don't know what to do to solve this, I tried the older ltx2 upscaler but obviously it no longer works with the updated model...

I have had the same problem. U have to use the new sigmas and preprocessing values then it will work.

Using a 1.5x upscaler worked without any problems. There didn't seem to be any artifacts.

However, since 1.5x is not divisible by 768x512, I specified 768x480. In the first stage, instead of resizing to 0.5x, I specified 512x320 in the Resize v2 node.

Having same issue. Last 10 frames with the X2 upscaler will always have something completely random in it.
1.5X doesnt have that issue.

I have tried the suggestion of using different sigmas and preprocess values but that didn't get rid of the issue, it "changed" it slightly so that its more of slight "flash" of something popping up but it's still there, tried many different sigmas but can't get rid of it. I will try the 1.5x upscaler until this is addressed.

Same. See the screenshot, either a slight flash (left) or some kind of logo/unwanted text overlay (right).

vlcsnap-2026-03-08-13h41m40s189

I have had the same problem. U have to use the new sigmas and preprocessing values then it will work.

The new manual sigma value for the upscaler is 0.85, 0.7250, 0.4219, 0.0 and the preprocess value is 18 I think? The manual sigma for the 8-steps first sampler hasn't changed according to the workflows from https://github.com/Lightricks/ComfyUI-LTXVideo/

Same. See the screenshot, either a slight flash (left) or some kind of logo/unwanted text overlay (right).

vlcsnap-2026-03-08-13h41m40s189

I have had the same problem. U have to use the new sigmas and preprocessing values then it will work.

The new manual sigma value for the upscaler is 0.85, 0.7250, 0.4219, 0.0 and the preprocess value is 18 I think? The manual sigma for the 8-steps first sampler hasn't changed according to the workflows from https://github.com/Lightricks/ComfyUI-LTXVideo/

This is exactly the issue I'm seeing, still haven't got it to go away, I've resorted to using a first-middle-last frame workflow and setting the "middle" frame to whatever frame I actually want the video to end on and then just cutting the remainder between the middle and final frame. For example if I want 480 frames at 24 fps I set my total frames to 504 and put my middle frame at 480 and then just cut 481-504 and that seems to be working for now. Obviously if you don't use img2vid then I don't know. Either way this needs to be addressed or an explanation of why it's happening.

It seems to happen on 1.5X as well if you have a lot of frames but not as often as 2X which is almost 99% of the time.
Iv just gone with trimming the end and back to the 2X upscale to keep thing simple for scaling.
image

I'm also getting this artifacts at the end of the video. I thought at first it was the workflow that I'm using. But then I tried some other workflows and I'm getting the same result with the artifacts at the end. In my case, it happens when my video is longer than 15 seconds.

I'm also getting this artifacts at the end of the video. I thought at first it was the workflow that I'm using. But then I tried some other workflows and I'm getting the same result with the artifacts at the end. In my case, it happens when my video is longer than 15 seconds.

I guess all we can do for now is to generate longer than needed and trim the offensive frames. Even the 1.5 upscaler this happens from time to time. It seems like the training data was polluted with endscreens and logos and ads or something. Such a shame for an otherwise excellent model.

It seems to be always the final 14 frames that are affected.

It seems like the training data was polluted with endscreens and logos and ads or something. Such a shame for an otherwise excellent model.

Yes and why didn't they notice it? There's also always some background music added at the end, which also seems to come from endscreens or ads. And the fonts of the letters seem to be Vietnamese?

It seems to be always the final 14 frames that are affected.

It changes based on length
For me
121 length = 6 frames.
481 length = 13 frames.
961 length = 16 frames.

Same here. Will be there a fix? Right now i am using the 1.5 upsaler ....

I asked KI and it told me :
Set ManualSigmas
instead of "1.0 ... 0.421875, 0.0"
to
1.0, 0.99375, 0.9875, 0.98125, 0.975, 0.909375, 0.725, 0.421875, 0.05

and it worked for me.

You changed only the value for the first sampler but not for the upscaler?

Tried that, no problem through the first sampler, but then the upsampler stops at 50% - so I had to change it there too at the end 0.0500. That somehow tripled the time until it finished. But there's still a logo at the end of a 20 second video. @UelivonWerdenberg : I'm using a dev model, do you use a dev or a distilled one and how long are the videos you generated where it worked for you?

only the second. I have the standard comfyui workflow video_ltx2_3_i2v and in this it is the #211 node.

only the second. I have the standard comfyui workflow video_ltx2_3_i2v and in this it is the #211 node.

Ok, it doesn't work for the dev model.

"the #211 node"... that must be from ChatGPT ;-)

Use euler sampler, Linear Quadratic (Mochi) scheduler.

That's it. That's the magic pairing that solved the problem for me. I genned hundreds of videos using different combinations of sampler/schedulers. This is the only one that completes avoids both the white border for the entirety of the video, and the logo added to the end.

After hundreds of generations its that the first step is too weak in the upscaler.
You can increase the noise of the first step and fixes it but then you lose a lot of the input videos guidance.
Using the 8 step upscale sigmas UelivonWerdenberg posted worked but is twice as slow because the extra steps.
So started playing with the sigmas values. Which is basically how much noise to apply per step.
Here is a comparison.
First video is the default workflow of comfyui with my prompt.
https://www.youtube.com/watch?v=7WeydM1aHJk

Here are the values of the videos.
TopLeft 0.987, 0.85, 0.725, 0.422, 0.0
TopRight 0.987, 0.85, 0.725, 0.422, 0.05
BottomLeft 1.0, 0.9875, 0.85, 0.421875, 0.05
BottomRight 1.0, 0.9875, 0.85, 0.421875, 0.0

0.987, 0.85, 0.725, 0.422, 0.0
would probably be my choice since that 0.0 last step is pure de-noise instead of having a slight amount of grain added at 0.05
I need to spend more time reducing just that first value until the logo comes back then increase a bit more again to find the best value.

Got to love how fast the model is on my 5090.
Had some batches finish at 481 frames and 961 frames finish and it looks like that first sigma needs to be different based on length.
Logo shows for a single frame with 481 frames and 0.9125 first sigma. So setting 0.9175 prevents it with a little room for error.
And with 961 frames it showed for 1 frame at 0.9605, Iv only ran 2 more at 961 frames but it looks to be gone with first sigma of 0.9655.
So here would be a quick table to test for the rest of them assuming its some what linear need to test more duration but im out of time.
Then it wouldn't be hard to put together a node to calculate the value based on duration.

Frames Duration (s) First sigma
120 5 0.9055
241 10 0.9175
361 15 0.9295
481 20 0.9175 βœ…
601 25 0.9295
721 30 0.9415
841 35 0.9525
961 40 0.9655 βœ…
1081 45 0.9765
1201 50 0.9875

481 frames
0.9175, 0.87, 0.735, 0.445, 0.0
962 frames
0.9655, 0.87, 0.735, 0.445, 0.0

Got to love how fast the model is on my 5090.
Had some batches finish at 481 frames and 961 frames finish and it looks like that first sigma needs to be different based on length.
Logo shows for a single frame with 481 frames and 0.9125 first sigma. So setting 0.9175 prevents it with a little room for error.
And with 961 frames it showed for 1 frame at 0.9605, Iv only ran 2 more at 961 frames but it looks to be gone with first sigma of 0.9655.
So here would be a quick table to test for the rest of them assuming its some what linear need to test more duration but im out of time.
Then it wouldn't be hard to put together a node to calculate the value based on duration.

Frames Duration (s) First sigma
120 5 0.9055
241 10 0.9175
361 15 0.9295
481 20 0.9175 βœ…
601 25 0.9295
721 30 0.9415
841 35 0.9525
961 40 0.9655 βœ…
1081 45 0.9765
1201 50 0.9875

481 frames
0.9175, 0.87, 0.735, 0.445, 0.0
962 frames
0.9655, 0.87, 0.735, 0.445, 0.0

@bmgjetThanks for the chart. Seems to be working. I tested generating a few videos using the values from the chart and I don't see that annoying flashing distorted text at the end of the videos. Anyway, I'll do more testing later and let you guys know.

GPT 5.2 made a custom node to automatically generate the correct sigma based on the length. I tested it on the low and high part and it worked for me.

I am not sure whether i should post the code here. I cannot upload the node file here.

Advice is appreciated

well GPT made me also one, but its nothing else than the above chart. it changes the value automatically, but only if you have the correct amount of frames. if you change the frame count by 20 frames it will invent settings that do not work. so you can also use the above chart. thank you for that one!!!

Sign up or log in to comment