How to use TemporalNet (Stable Diffusion IMG2IMG VIDEO TO VIDEO ANIMATION)

A quick guide on how to use TemporalNet without the python script. Video was made because I got many questions on how to use TemporalNet, and there aren’t really any proper guides online for it. I’m not home for a few months, so I only had access to a laptop with low vram. Even just using the 3 ControlNet units pretty much maxed out my VRam usage, so I couldn’t get better looking results for you. Just generating 1 frame also took over 5 minutes, so... yeah. The settings were not tested, just went with the first settings I tried. Would have made a more video like guide, but due to laptop hardware and time limitations... It’s mostly pictures. I cropped the video beforehand using davinci resolve studio 18, and then extracted frames using ffmpeg at 15fps (original was 30fps). Then I just ran the batch with the settings. For postprocessing of the final clip, I used Davinci’s deflickering and flowframes to up the fps back up to 30. As you can see, this method allows for quite drastic style changes whilst being pretty consistent. I believe this is currently the best method for denoising 1 that I have seen (except maybe Tokyo_Jab’s method, but this allows for longer videos and easier frame-by-frame editing in my opinion). Another model that might help is reference_only, but I couldn’t get satisfactory results from the little testing I did in combination with the TemporalNet method. I personally always use 2 TemporalNets: current and loopback. General Tips for making videos: - Background masking/removal helps a lot when trying to make things consistent. The actor/actress in the video is usually fairly simple to get consistent - Recommended (Optional) post processing methods: Davinci’s Deflicker, Flowframes, (EBSynth) - Also recommended to go and fix select frames afterwards. Sometimes hands become cloths etc. - Use Multi-Controlnet. Some combos that I have found great are: -- Normal_bae(or depth) Softedge_hed (or lineart_realistic) canny (threshholds around 35-45) openpose_face -- openpose_full, normal_bae, canny -- normal_bae, openpose_full As for the weights, it varies quite a lot but I usually start at having Normal_bae around , hed around , canny around 0.4 and openpose face around . Then Temporalnet (current) at and temporalnet (loopback) at half of current. Since someone will always ask anyways, here is the generation data with controlnets used: Positive: (masterpiece:1.4, best quality), (intricate details), unity 8k wallpaper, ultra detailed, beautiful and aesthetic, Korean girl dancing, half body, brunette, brown hair, white clothing, white sports bra, white top, navel, belly, Negative: (worst quality, low quality:1.4), (zombie, sketch, interlocked fingers, comic), (mask:1.2), blurry, high contrast Steps: 24, Sampler: Euler, CFG scale: 7, Seed: 4174458460, Size: 576x1024, Model hash: 77b7dc4ef0, Model: meinamix_meinaV10, Denoising strength: 1, Clip skip: 2, ADetailer model: , ADetailer prompt: “korean face, perfect face, green eyes“, ADetailer confidence: 0.3, ADetailer dilate/erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.4, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer use inpaint width/height: True, ADetailer inpaint width: 512, ADetailer inpaint height: 512, ADetailer version: , ControlNet 0: “preprocessor: openpose_full, model: control_v11p_sd15_openpose [cab727d4], weight: 0.7, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (512, -1, -1)“, ControlNet 1: “preprocessor: none, model: diff_control_sd15_temporalnet_fp16 [adc6bd97], weight: 0.6, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (-1, -1, -1)“, ControlNet 2: “preprocessor: none, model: diff_control_sd15_temporalnet_fp16 [adc6bd97], weight: , starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: True, control mode: Balanced, preprocessor params: (-1, -1, -1)“, Version:

28 views

1034

287