Using prompts to modify face and body in Stable Diffusion XL

Created on and updated on

The Basics

The model you choose has the most significant impact on the quality of the generated images. It’s very important to choose the right model for the work you want to do. There are models trained to generate realistic images of people, general models that can generate anything, models targeted at generating anime or cartoon styles, and many more. You can find them on popular website CivitAI.com. In case of Realistic Person Generation, I recommend using RealVis, ZavyChroma and CHINOOK. The first one is my personal favorite at moment of writing this article. Every model has its own strengths and weaknesses. You need to find the one that suits your needs best.

The second most important factor are the prompts you write. Positive and Negative. They have impact on the appearance of the generated person and the overall quality of the image. If you don’t specify exactly how the generated person should look, they will most likely resemble those on which the model was trained. I don’t remember how many times I’ve seen people talking on Reddit about seeing the same faces in generated images. You have many possibilities to manipulate the appearance of the generated person using just prompts. Starting from the face shape, through the hair style, skin tone, body type, specific body parts, age, nationality, and more.

The third factor is configuration. Image resolution, Sampler, number of Sampling Steps, Guidance Scale and Variational Autoencoder (VAE) are the most important things in the configuration. Each of them has a huge impact on the final image. Higher resolution means more details will be visible. It’s like giving artist a bigger canvas to paint on. Sampler has control over image generation process and each sampler can produce different results. More Sampling Steps means more details will be rendered. Skin and clothes will have better textures. More objects will appear on generated image. Guidance Scale is a multiplier for the influence of the prompts on the final image. So, with higher Guidance Scale, the prompts will have more impact on the final image. VAE is responsible for the quality of the generated image. In case of SDXL, standard VAE should be fine.

Negative Prompt

Let’s start with the negative prompt and then move on to the positive prompts. The Negative Prompt is not just for specifying objects that should not appear in the generated image, but it can also have an impact on the quality of the generated image. They are not required, but from my experience, they help. In my own works I prefer to use Negative Embeddings - because they are more convinient to use. I can recommend you to use NDXL, FiXL or ACNeg1 (you can mix them, but using only one should be enough). Anyway, for the purpose of this article, let’s create short Negative Prompt. We will specify what we don’t want to see in the generated image.

worst quality, low quality, low resolution, painting, illustration, cartoon, sketch, 3d, 2d, blurry, grainy, pixelated, distorted, poorly drawn, bad anatomy, unrealistic proportions, text, watermark, signature, error, artifact

Below you can see the difference between the image generated without Negative Prompt (left side) and the image generated with Negative Prompt (right side).

Image without Negative Prompt
Image with Negative Prompt

Positive prompt - What we want to generate

Let’s generate a photography of a real person. Let the woman be the subject of our work and let’s start our positive prompt by specifying that we want to generate a photography of a woman. Prompting hard on the photography types will help us to generate a realistic image.

photoshoot, photo studio, RAW photo, editorial photography, film stock photography, a photography of a woman

Keep in mind that:

We have basic stuff covered. We can move on to the modifications.

Face Features

You can specify all the face features you want to modify. Starting from the face shape, through the eyes, nose, mouth, ears, chin, cheeks, forehead, eyebrows, freckles, moles, scars, wrinkles, facial hair and ending on expressions (hope I didn’t forget anything). You can describe them in a way that you want them to be.

Examples: oval face shape, bushy eyebrows, big eyes, wide nose, full lips, double chin, freckles on the cheeks, wrinkles around the eyes.

round face, low forehead, bushy eyebrows, downturned nose, full lips, weak chin, subtle cheekbones, protruding earsoval face, receding forehead, straight eyebrows, wide nose, full lips, weak chin, prominent cheekbones, small earsround face, prominent forehead, arched eyebrows, long nose, full lips, square chin, low cheekbones, protruding earsround face, receding forehead, arched eyebrows, downturned nose, heart-shaped lips, pointed chin, low cheekbones, small ears

Makeup

You can specify the makeup style you want to see on the generated person. Makeup can affect the face, hair, clothes and even the scenery. A lot depends on keywords you use.

You can specify individual things like red lipstick, pink eyeliner, eyeshadow, lip gloss, eyebrow pencil, highlighter, contour, bronzer and this should not affect the generated image too much.

Prompting makeup styles like natural makeup, glam makeup, gothic makeup, retro makeup, shimmer makeup might have impact on things not related to the face.

Below you can see how prompting makeup can affect the generated image. Anti-aging Makeup automatically generated older woman. Shimmer Makeup affected clothes. Prom Makeup made the woman look younger. Retro Makeup affected the hairstyle and colors of the image.

Anti-aging MakeupShimmer MakeupProm MakeupRetro Makeup

Hair Style and Color

Good thing is that Stable Diffusion knows some hair styles. But probably you don’t know them. You can ask AI for the list, browse the internet, read some fashion magazines or just find the list that already exists. When you find the name of the hair style you like, you can use it in the prompt, combining it with the color (if you want).

Examples: braided wrap-around ponytail, short curly hair, blonde double bun, twisted bangs, pink dreadlocks or something simple like long dark hair.

Keep in mind that:

blonde shaved designs hairecru messy pixie hairtextured bob hairdark chignon hair

Skin

You can describe the skin tone, texture, and any other skin features you want to see.

Examples: dark skin, pale skin, detailed skin texture, freckles on the skin, freckles on the shoulders, tan, tan lines, saggy skin, wet skin, tattoos, scars, etc.

Body Type

You can modify the body type in three ways.

  1. By describing the person’s profession like: athlete, sumo wrestler, bodybuilder.
  2. By specifying the body type like: slim body type, chubby body type, muscular body type.
  3. By specifying individual body parts like: long legs, big belly, muscular arms.

I believe that you never seen sumo wrestler without big body weight or bodybuilder without muscles. Things are related. So you can make the generated person looks muscular by specifying that he is a bodybuilder for example. But then you need to prompt the clothing that bodybuilder would wear and the place where he would be. Because with the bodybuilder profession, sport clothes and gym in the background are more likely to appear in the generated image.

Specyfying the body type or individual body parts is a bit better than specyfying the profession. Because it does not affect clothing and background. Stable Diffusion understands the body types and body parts. Just open your imagination and describe that as you want.

slim body type, very skinny bodymuscular body type, abs sixpack, muscles...very thick body type, very fat, chubby

Nationality and Ethnicity

Stable Diffusion was trained on images of people from all over the world. So it knows how people from different countries look like. Depending on the region of the world people will look different. Prompting Nationalities and Ethnicities is connected with the skin tone, facial features and body types. And that’s not all. People from different countries have different clothing styles, hair styles, etc. The scenery where they live is also different.

You can specify nationality like: Japanese, Spanish, Somalian.
And ethnicity like: Asian, European, African.

You can get large list of nationalities here. That list was inspired by this Reddit post.

Kazakhstani womanFilipino womanEmirati womanCameroonian woman

Age

Prompting age might be a bit tricky, because age is connected with some other features.

If you want to generate an old person, you can prompt old woman, old lady or grandma. She will most likely have wrinkles, saggy skin, grey hair, and wear clothes that are typical for old people. She will most likely be in the scenery that is typical for old people, like a porch or a living room. To increase the chance that the generated person will look old, you can specify features like: old lady, wrinkles, grey hair.

If you want to generate a young person, you can prompt young girl, girl, teenager, etc. She will most likely have smooth skin, no wrinkles, and wear clothes that are typical for young people. Also being young is connected with height and body type. Young people are usually shorter and slimmer than older people.

Prompting specific age with the number might not work well, but you can try it like: a girl, 20 years old, 40 years old woman, woman, 80 years old, etc.

young girlgirlmidlife womanold lady

Scenery

The scenery can have some impact, but it’s not always connected with the generated person. It can affect the person if you didn’t specify that person by yourself.

If you generate a woman in the gym, she will most likely have muscles.
A woman in office will most likely have office clothes.
A woman in the desert will most likely have tan or dark skin.
A woman picking tea will most likely be Asian.

Keep in mind that locations has light sources. Light sources are creating shadows and highlights. It might be window, lamp, sun, etc. If the generated person will be inside the building, most likely there will be more shadows on the face.

Below you can observe that woman on the farm has different shirt material than women on the other images. Her skin is pretty natural and she has delicate makeup. Woman in the cafe has more makeup and because of that her skin is less detailed. Woman in zen monastery looks slightly Asian because Zen Monasteries are mostly located in Asia. And the last woman in the nightclub has more makeup. Clearly visible Eyeliners and Lipstick.

farmcafezen monasterynightclub

Lighting

Lighting can also have small impact on generated person. It’s especially visible when prompting studio lighting, neo-noir lighting or direct flash lighting. The skin texture might be different, makeup might be more visible or hairstyle might be affected.

List of useful lighting types you can find here, inspired by this Reddit post.

Camera Type and Film

This is very cool thing. You can achieve analog photography look by specifying the camera type or film. Some people are using 200MB or even 800MB Lora’s to get analog photography look. But you can achieve that with just a few words in the prompt. There is a lot of camera models and film types you can specify. I categorized them into three simple groups: these with analog effect, these with color effect and grayscale effect. Let’s list some of them.

Watch out! I did some mistakes in list below. Some of them are not accurate.
I based on effects that I saw in generated images.
Some of names are black and white films, but they had colors in my works.
I need to correct them. I will do that soon.

Analog effect: Fujifilm FP-100C, Kodak Portra 160, Lomography Berlin Kino 400
Color effect: Kodak Aerochrome, Lomography Redscale XR 5-200, Rollei Infrared 400
Grayscale effect: Ilford SFX 200, Kodak Tri-X 400, Fujifilm Acros II

From my own studies I can say that they not only affect the colors, but also affect the face. Some of them make the face look more detailed and some of them make the face less textured. Some of them, like Rollei Retro 80s might not just have an impact on the colors and face, but also on the hair style. I think that’s because of “retro” and ”80s” in the name. Most of them has impact on shadows and highlights.

List of Camera Models is available here and list of Camera Films is available here. Both lists were inspired by SDXL 1.0 Artistic Studies.

Kodak T-Max 100Polaroid 600Foma Fomapan 200 CreativeLomography Redscale XR 50-200

Year

You can specify the year of the photo. It will have big impact on generated image, people on image, their clothes, hairstyles and scenery. I would recommend you to use years from 1826, when the photography was invented, to date of the model training.

We have five key Periods of Photography, that Stable Diffusion recognizes:

There is no magic here. You just specify the year you want and check the results.
You can also specify decade like 50s, 60s, 70s, etc.

But, keep in mind that old photos are usually worse in quality. You shouldn’t combine keywords like studio photo, raw photo or high quality when specifying old year, because you might not achieve the expected result.

To increase the chance that the generated photo will look old, you can add extra keywords like: old photo, vintage photo, sepia photo, black and white photo, damaged photo, scratched photo, history, antique, retro, nostalgia, etc.

Also, from Negative Prompts you can remove keywords like: low quality, bad quality, worst quality or low resolution.

For extremely old photos, you use keywords like: daguerreotype, tintype, ambrotype, calotype, wet plate collodion, albumen print, cyanotype, platinum print, gelatin silver print, etc.

Name

Names are connected with famous people. Prompting names like Scarlet or Angelina may affect the appearance of the generated person. If you also specify the surname, the effect will be much stronger. Most likely the generated person will have similar hairstyle, facial features, skin tone, etc. to the person you specified.

Scarlett JohanssonGal Gadot50:50 mix of Beyoncé Knowles & Angelina Jolie50:50  mix of Naomi Campbell & Cara Delevingne

Mixing

You can mix all of the above to get unique results. You can do that by:

Results are not satisfying

Have problems with generating things you want to see? Try to:

Problems you might encounter

Finish

Hope you enjoyed this article and learned something new.

More images will be added soon.