electric_sheep_dreams
Level 1
Introduction
As the thread's question asks, can stable diffusion make Minecraft skins?
Unequivocally, the answer is yes. Stable diffusion is capable of making Minecraft skins, fairly good ones at that!
Background
About me
I have interest in AI and had the free weekend to invest time into Minecraft skins.
My frustrations
I could not for the life of me find any decent models that could generate minecraft skins, all of them are either really poor or does not suit my need in the slightest. I thought I could do better, so, well, I gave it a go!
Challenges
I am not the first to tackle this, there are some notable people that has attempted to, biggest of which is https://monadical.com/
However, it's ugly. Really ugly. No, really, the generated skin is not something I would ever be caught dead in.
So.. this raise an important question, where did they go wrong, and how did I go right?
The devil lives in the details
Let's look at some of the skins that monadical made
You can't really say it looks good. Sure, they are indeed minecraft skins, but.. they look terrible. Really terrible, but why?
The skin makers among the readers would immediately know: there are no transparent layers.
But.. there is a more pressing issue we must address
Minecraft skins are not 3-dimensional, but a projected 2-d
Right.. so, Minecraft skins are not really 3 dimensional. It's a texture map that is placed on a connected cubes that makes a player model!
If you skipped through the rant, to reiterate: nothing special should've been done to the 3d representation
Final thoughts
Truth be told, I am not sure what to write for this, I am just excited to show off the advancements that AI has made, along with how it is also applicable to minecraft as well under the right training conditions and data. I hope this post was interesting to someone and inspire them in some way. If anyone is curious on the actual technical-technical details, feel free to ask, and if this gets enough interest, I'll write a part 2 and train this on more data to see how far this can go.
I am sure this could be extended to many more clothing styles, hair shading types, accessories, etc. This is my first time doing anything with diffusion models, so if there is a next iteration to this, I am hoping I can do things a little better.
Technologically, this is testament to the power of pre-training and how models really don't see things the same way we do, to be able to understand a 2d texture map with almost no prior understanding. The total cost I've spent on this was ~$5 and a few hours of training time, I have no plans currently on releasing the model weights or the dataset used to construct this data.
Cheers! Happy coding
As the thread's question asks, can stable diffusion make Minecraft skins?
Unequivocally, the answer is yes. Stable diffusion is capable of making Minecraft skins, fairly good ones at that!
Prompted to make a skin with light tone, pink hair, and description of clothing
Same prompt as above but with beige hair
The skins above are AI generated! (First image at 3000steps, second at 7000steps) They are rendered on novaskin, the main question you might is: how did you do it, and who made it?
I'll talk about all of that below, but to note, this model is fine-tuned by me on a free evening I had! Do note as well that this is trained on a very VERY small amount of data and not a lot of epochs, so this is likely a fraction of what it could really do.
Same prompt as above but with beige hair
The skins above are AI generated! (First image at 3000steps, second at 7000steps) They are rendered on novaskin, the main question you might is: how did you do it, and who made it?
I'll talk about all of that below, but to note, this model is fine-tuned by me on a free evening I had! Do note as well that this is trained on a very VERY small amount of data and not a lot of epochs, so this is likely a fraction of what it could really do.
Background
About me
I have interest in AI and had the free weekend to invest time into Minecraft skins.
I am going to be explaining what stable diffusion is in very abstract terms, as I personally am not well-versed in this field or the rigor of the mathematics behind this process. But I will be happy to answer any specific questions to the best of my ability.
The namesake that this has originated from is the process of "diffusion", which in my mind, I imagine as dropping ink into a glass of water, slowly diffusing color into it.
This creates a muddy water that otherwise holds no information of what it has been previously, we then can describe what we would like this muddy water to become. Maybe we would like to make it purple.
To make it purple, we "encode" the knowledge of "purple", in which the diffusion model will try its best to add more ink into this muddy water for it to become purple (decoding process, and the step after the encoding is the latent representation).
Anyhow! The end result is what I would want, in the real world example, an image
The namesake that this has originated from is the process of "diffusion", which in my mind, I imagine as dropping ink into a glass of water, slowly diffusing color into it.
This creates a muddy water that otherwise holds no information of what it has been previously, we then can describe what we would like this muddy water to become. Maybe we would like to make it purple.
To make it purple, we "encode" the knowledge of "purple", in which the diffusion model will try its best to add more ink into this muddy water for it to become purple (decoding process, and the step after the encoding is the latent representation).
Anyhow! The end result is what I would want, in the real world example, an image
My frustrations
I could not for the life of me find any decent models that could generate minecraft skins, all of them are either really poor or does not suit my need in the slightest. I thought I could do better, so, well, I gave it a go!
Challenges
I am not the first to tackle this, there are some notable people that has attempted to, biggest of which is https://monadical.com/
However, it's ugly. Really ugly. No, really, the generated skin is not something I would ever be caught dead in.
So.. this raise an important question, where did they go wrong, and how did I go right?
The devil lives in the details
Let's look at some of the skins that monadical made
You can't really say it looks good. Sure, they are indeed minecraft skins, but.. they look terrible. Really terrible, but why?
The skin makers among the readers would immediately know: there are no transparent layers.
But.. there is a more pressing issue we must address
Minecraft skins are not 3-dimensional, but a projected 2-d
Right.. so, Minecraft skins are not really 3 dimensional. It's a texture map that is placed on a connected cubes that makes a player model!
We as humans have tendencies to associate things to be anthropomorphic, when deep blue was first made and beat Garry Kasparov, we assumed it was intelligent, maybe we could use it to solve other games like Go. It wasn't, it's only good at what it was trained to do, it does not learn the way us humans do, it does not come up with its ingenious move through complicated thoughts or any chain of reasoning.
I feel the same way now with every AI, but we cannot help but assign human traits to AI. A famous paper "attention is all you need" is a big example, we try to presume that models have "attention" to them, when really, god knows what it is doing adding embeddings together with the weight it likes.
We say "multi-head attention" as the attention to many things, but.. is it? Or just another linear transformation on top of attention heads concatenated together. This brings me to the current point, sure, humans would find it difficult to create skins on a 2d texture without a 3d model to guide, but remember, machines aren't humans. They do not have the same spatial understanding as we do, they have much richer representations than us humans can access (at least, that is my belief). Therefore, unlike the monadic approach where they supply the model with a 3d representation of the player along with the 2d mapping to aid it, I decided against it. And it works, not that I am particularly surprised by it.
I feel the same way now with every AI, but we cannot help but assign human traits to AI. A famous paper "attention is all you need" is a big example, we try to presume that models have "attention" to them, when really, god knows what it is doing adding embeddings together with the weight it likes.
We say "multi-head attention" as the attention to many things, but.. is it? Or just another linear transformation on top of attention heads concatenated together. This brings me to the current point, sure, humans would find it difficult to create skins on a 2d texture without a 3d model to guide, but remember, machines aren't humans. They do not have the same spatial understanding as we do, they have much richer representations than us humans can access (at least, that is my belief). Therefore, unlike the monadic approach where they supply the model with a 3d representation of the player along with the 2d mapping to aid it, I decided against it. And it works, not that I am particularly surprised by it.
If you skipped through the rant, to reiterate: nothing special should've been done to the 3d representation
Final thoughts
Truth be told, I am not sure what to write for this, I am just excited to show off the advancements that AI has made, along with how it is also applicable to minecraft as well under the right training conditions and data. I hope this post was interesting to someone and inspire them in some way. If anyone is curious on the actual technical-technical details, feel free to ask, and if this gets enough interest, I'll write a part 2 and train this on more data to see how far this can go.
I am sure this could be extended to many more clothing styles, hair shading types, accessories, etc. This is my first time doing anything with diffusion models, so if there is a next iteration to this, I am hoping I can do things a little better.
Technologically, this is testament to the power of pre-training and how models really don't see things the same way we do, to be able to understand a 2d texture map with almost no prior understanding. The total cost I've spent on this was ~$5 and a few hours of training time, I have no plans currently on releasing the model weights or the dataset used to construct this data.
Cheers! Happy coding
Last edited: