Fresh Hacker News | Qwen-Image-Layered: transparency and layer aware open diffusion model

▲Qwen-Image-Layered: transparency and layer aware open diffusion model(huggingface.co)

98 points by dvrp 1 day ago | 5 comments

▲dvrp 12 hours ago

Qwen-Image-Layered is a diffusion model that, unlike most SOTA-ish models out there (e.g. Flux, Krea 1, ChatGPT, Qwen-Image) it's (1) open-weight (unlike ChatGPT Image or Nano Banana) and Apache 2.0; and has 2 distinct inference-time features: (i) it's able to understand the alpha channel of images (RGBA, as opposed to RGB only) which makes it able to generate transparency-aware bitmaps; and (ii), it's able to understand layers [1]—this is how most creative professionals work in software like Photoshop or Figma, where you overlay elements into a single file, such as a foreground and a background.

This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.

The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.

---

Technically there's another difference, but this mostly matters for people who are interested in AI research or AI training. From their abstract: “[we introduce] a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer.” which seems to imply that you can adapt a current (but different) image model to understand layers as well, as well as a pipeline to obtain the data from Photoshop .PSD files.

▲dvrp 12 hours ago

- Model page: https://huggingface.co/Qwen/Qwen-Image-Layered

- Quantized model page: https://huggingface.co/QuantStack/Qwen-Image-Layered-GGUF

- Blog URL: https://qwenlm.github.io/blog/qwen-image-layered/ (404 at the time of writing this comment, but it'll probably release soon)

- GitHub page: https://github.com/QwenLM/Qwen-Image-Layered

▲smusamashah 1 hour ago

Article link https://qwen.ai/blog?id=qwen-image-layered

▲firenode 2 hours ago

any workflow on this? Civitai workflow doesn't work.

▲ThrowawayTestr 5 hours ago

Anyone have a good workflow for combining images in comfyui? I could never get it to work.

▲firenode 2 hours ago

Did you try Civitai workflow? I also failed.

▲SV_BubbleTime 9 hours ago

I’m still not clear if it’s going to deliver the unique layers to you?

If you set a variable layers of 5 for example will it determine what is on each layer, or do I need to prompt that?

And I assume you need enough VRAM because each layer will be effectively a whole image in pixel or latent space… so if I have a 1MP image, and 5 layers I would likely need to be able to fit a 5MP image in VRAM?

Or if this can be multiple steps, where I wouldn’t need all 5 layers in active VRAM, that the assembly is another step at the end after generating on one layer?

▲jamilton 9 hours ago

The linked GitHub readme says it outputs a powerpoint file of the layers.

▲oefrha 6 hours ago

I don't see the word powerpoint anywhere in https://github.com/QwenLM/Qwen-Image-Layered, I only see a code snippet saving a bunch of PNGs:

  with torch.inference_mode():
      output = pipeline(**inputs)
      output_image = output.images[0]
  
  for i, image in enumerate(output_image):
      image.save(f"{i}.png")

Unless it's a joke that went over my head or you're talking about some other GitHub readme (there's only one GitHub link in TFA), posting an outright lie like this is not cool.

▲dragonwriter 4 hours ago

> I don't see the word powerpoint anywhere in https://github.com/QwenLM/Qwen-Image-Layered,

The word "powerpoint" is not there, however this text is:

“The following scripts will start a Gradio-based web interface where you can decompose an image and export the layers into a pptx file, where you can edit and move these layers flexibly.”

▲oefrha 4 hours ago

Oh okay I missed it, sorry. But that’s just using a separate python-pptx package to export the generated list of images to a .pptx file, not something inherent to the model.

▲Llamamoe 8 hours ago

...of all the possible formats, it outputs.. a powerpoint presentation..? What.

▲dragonwriter 4 hours ago

The github repo includes (among other things) a script (relying on python-pptx) to output decomposed layer images into a pptx file “where you can edit and move these layers flexibly.” (I've never user Powerpoint for this, but maybe it is good enough for this and ubiquitous enough that this is sensible?)

▲djfobbz 8 hours ago

Lol, right?!?! I would've expected sequential PNGs followed by SVGs once the model improved.

▲CamperBob2 8 hours ago

That's what the example code at https://old.reddit.com/r/StableDiffusion/comments/1pqnghp/qw... generates. You get 0.png, 1.png ... n.png, where n= the requested number of layers-1.

It'll drop a 600W RTX 6000 to its knees for about a minute, but it does work.

▲dvrp 7 hours ago

I saw some people at a company called Pruna AI got it down to 8 seconds with Cloudflare/Replicate, but I don't know if it was on consumer hardware or an A100/H100/H200, and I don't know if the inference optimization is open-source yet.