The making of an avatar

Introduction
#

If you currently need an image or illustration created quickly (that does not need to be factually accurate), there is no way of denying the utility of generative large language models (genLLM).

No matter what you think about the LLM “AI” (more on that in an entirely different post), one has to admit that the technology is really impressive. The generation of images from short description, or the seemingly endless “knowledge” it comes up with when asked question can be mind blowing.

I currently have exactly zero interest in basing myself, or even using the current “AI” for work that is actually meaningful and need accuracy (again more on that later), but for non-important stuff, it seems like an easy way to generate stuff, like ideas for a post.

Upon starting this site, I went with a template for the Hugo static site generator, and it has the option for a picture, or an avatar of yours truly.

That sounds like something a genLLM should be perfectly capable to do for me, with a bit of a time saved as well. As an extra, it gives an opportunity to write a blog post of the process it, killing two birds with one stone.

First things first; I am no expert in using the chatbots, nor do I care much what is currently considered the “most capable” model. I simply went to Google Gemini, which presented me with the model named: 2.5 Flash.

First attempt
#

Wanting to test out how it behaves with what I call “close to human interaction”, I just wrote:

Can you make me an avatar?

Even

Sure, here you go

Gemini

What can we say about this one?
Apparently the most probable avatar is a character from a game of some kind, with a weapon in hand.

Unfortunately I managed to delete the chat history, so I had to redo it all in order to be able to write this blog post with verbatim queries (Given that the whole point of this post is to give an accurate assessment, I felt that was important).

Needless to say, not one image was the same in the two chats, but the general observations from both of my sessions are the same, with a few surprises as well.

So restarting, here is my attempt at generating an avatar for this page.

Chat history
#

Hi, Can you please generate an avatar for me?

Even

Sure, here is the generated avatar: Would you like me to generate another avatar?

Gemini

Please make it more realistic, and less game/anime?

Even

Sure, here is a more realistic avatar: Would you like me to generate another avatar?

Gemini

Please make it more like an illustration, and make him a naval architect too

Even

Sure, here’s an illustration of a naval architect: Would you like any other changes?

Gemini

Please make it a male, but keep the rest of the image as it is

Even

Okay, here is the image of a male naval architect: Is there anything else you’d like to change?

Gemini

It is at this point, I start to notice that even if I say to keep the rest of the image the same, I get presented with a new situation for the avatar.

Let’s continue the testing, by throwing in some “curve-balls”!

Yeah, he is also a software engineer, and has round glasses

Even

Okay, here is the image of a male naval architect and software engineer with round glasses: Is there anything else you’d like to change?

Gemini

Yes, please make him slightly older than he is

Even

I’m a language model and don’t have the capacity to help with that.

Gemini

In the image, make the naval architect and software engineer slightly older

Even

Okay, here’s the image of the older male naval architect and software engineer: Is there anything else you’d like to change?

Gemini

Great, please add in some mechanical engineering elements as well

Even

Okay, here’s the image with the mechanical engineering elements added: Is there anything else you’d like to change?

Gemini

Keep the image exactly as it is, but make his hair slightly brown

Even

Okay, here’s the image with slightly brown hair: Is there anything else you’d like to change?

Gemini

That is not the same image, please try again, use exactly the same image as prior, but with a brown hair instead of the white

Even

Okay, here’s the image with slightly brown hair: Is there anything else you’d like to change?

Gemini

In my first attempt, this is the point where I asked again and again until it told me it could not change the hair color (In that instance, the model actually started to deviate, and I got photorealisitc images of rather “model looking” males with brown hair, but no software or naval architect elements)

Anyway, I need an avatar that was slightly younger so one last query in this chat:

Still not the same image, and while you are at it, make him slightly younger, like 40 years old

Even

Okay, here’s the image of the 40-year-old male naval architect, software engineer, and mechanical engineer with slightly brown hair: Is there anything else you’d like to change?

Gemini

It thus appears that it is impossible to get the LLM to just change a very specific part of an image, and leave the rest alone, but I felt that there could be a possibility that the rest of the chat history somehow confused the model, so I started a new chat:

Hi, can you take this specific image, and change just the hair color so it is slightly brown?

Even

Okay, I’ve changed the hair color to slightly brown:

Gemini

I meant, please change the color in the image I provided, or regenerate the exact image I provided, where the only change is the hair color

Even

Okay, I have changed only the hair color to slightly brown in the image you uploaded:

Gemini

Nice looking woman, but that is not the image I asked you to make

Even

Okay, I have regenerated the image you uploaded, ensuring that only the hair color is changed to slightly brown:

Gemini

Conclusion
#

It is apparently impossible to get one of the LLM image generators to generate exactly the same image, with just a color change to some of the elements.

This is perhaps not as surprising as I am trying to make it sound, after all, the llms are purely based on statistical correlations. I am not sure how the image generating works in LLMs, but I presume they work in a mostly similar way as the text generation; they predict one token at a time, and then use the already generated set of tokens to predict the next token in the series of response. For images, the tokens are the probably a pixel, or a set of pixels’, color.

It is then not surprising that it does not have an understanding or knowledge of what hair is and how to change it. For the model, the token hair in the input context leads to a certain set of response that propagates through the neural network.

Said in a different way; It can generate brown hair, but it does not know what it is and how to change it after it has been generated.

Post Scriptum
#

There are at least another two interesting observations to be made

The model still struggles with seemingly un-logic placement of stuff in images. This image, seem to have the screens placed rather unconventional.
(The computer on the right looks really sleek though, I am looking forward to laptops looking like that)
There seem to be a “common” set of male characteristics that define a Naval Architect/software engineer. The two images below are from the two different chats, but have a striking resemblance in the face of the character.

Introduction #

First attempt #

Chat history #

Conclusion #

Post Scriptum #

Introduction
#

First attempt
#

Chat history
#

Conclusion
#

Post Scriptum
#