Introduction #
If you currently need an image or illustration created quickly (that does not need to be factually accurate), there is no way of denying the utility of generative large language models (genLLM).
No matter what you think about the LLM “AI” (more on that in an entirely different post), one has to admit that the technology is really impressive. The generation of images from short description, or the seemingly endless “knowledge” it comes up with when asked question can be mind blowing.
I currently have exactly zero interest in basing myself, or even using the current “AI” for work that is actually meaningful and need accuracy (again more on that later), but for non-important stuff, it seems like an easy way to generate stuff, like ideas for a post.
Upon starting this site, I went with a template for the Hugo static site generator, and it has the option for a picture, or an avatar of yours truly.
That sounds like something a genLLM should be perfectly capable to do for me, with a bit of a time saved as well. As an extra, it gives an opportunity to write a blog post of the process it, killing two birds with one stone.
First things first; I am no expert in using the chatbots, nor do I care much what is currently considered the “most capable” model. I simply went to Google Gemini, which presented me with the model named: 2.5 Flash.
First attempt #
Wanting to test out how it behaves with what I call “close to human interaction”, I just wrote:
Sure, here you go

What can we say about this one?
Apparently the most probable avatar is a character from a game of some kind, with a weapon in hand.
Unfortunately I managed to delete the chat history, so I had to redo it all in order to be able to write this blog post with verbatim queries (Given that the whole point of this post is to give an accurate assessment, I felt that was important).
Needless to say, not one image was the same in the two chats, but the general observations from both of my sessions are the same, with a few surprises as well.
So restarting, here is my attempt at generating an avatar for this page.
Chat history #
Sure, here is the generated avatar: Would you like me to generate another avatar?

Sure, here is a more realistic avatar: Would you like me to generate another avatar?

Sure, here’s an illustration of a naval architect: Would you like any other changes?

Okay, here is the image of a male naval architect: Is there anything else you’d like to change?

It is at this point, I start to notice that even if I say to keep the rest of the image the same, I get presented with a new situation for the avatar.
Let’s continue the testing, by throwing in some “curve-balls”!
Okay, here is the image of a male naval architect and software engineer with round glasses: Is there anything else you’d like to change?

Okay, here’s the image of the older male naval architect and software engineer: Is there anything else you’d like to change?

Okay, here’s the image with the mechanical engineering elements added: Is there anything else you’d like to change?

Okay, here’s the image with slightly brown hair: Is there anything else you’d like to change?

Okay, here’s the image with slightly brown hair: Is there anything else you’d like to change?

In my first attempt, this is the point where I asked again and again until it told me it could not change the hair color (In that instance, the model actually started to deviate, and I got photorealisitc images of rather “model looking” males with brown hair, but no software or naval architect elements)
Anyway, I need an avatar that was slightly younger so one last query in this chat:
Okay, here’s the image of the 40-year-old male naval architect, software engineer, and mechanical engineer with slightly brown hair: Is there anything else you’d like to change?

It thus appears that it is impossible to get the LLM to just change a very specific part of an image, and leave the rest alone, but I felt that there could be a possibility that the rest of the chat history somehow confused the model, so I started a new chat:

Hi, can you take this specific image, and change just the hair color so it is slightly brown?
Okay, I’ve changed the hair color to slightly brown:

Okay, I have changed only the hair color to slightly brown in the image you uploaded:

Okay, I have regenerated the image you uploaded, ensuring that only the hair color is changed to slightly brown:

Conclusion #
It is apparently impossible to get one of the LLM image generators to generate exactly the same image, with just a color change to some of the elements.
This is perhaps not as surprising as I am trying to make it sound, after all, the llms are purely based on statistical correlations. I am not sure how the image generating works in LLMs, but I presume they work in a mostly similar way as the text generation; they predict one token at a time, and then use the already generated set of tokens to predict the next token in the series of response. For images, the tokens are the probably a pixel, or a set of pixels’, color.
It is then not surprising that it does not have an understanding or knowledge of what hair is and how to change it. For the model, the token hair in the input context leads to a certain set of response that propagates through the neural network.
Said in a different way; It can generate brown hair, but it does not know what it is and how to change it after it has been generated.
Post Scriptum #
There are at least another two interesting observations to be made
-
The model still struggles with seemingly un-logic placement of stuff in images. This image, seem to have the screens placed rather unconventional.
(The computer on the right looks really sleek though, I am looking forward to laptops looking like that) -
There seem to be a “common” set of male characteristics that define a Naval Architect/software engineer. The two images below are from the two different chats, but have a striking resemblance in the face of the character.

