Understanding colors in visual storytelling : how to extract color palettes

I want you to close your eyes and imagine two characters from star wars fighting, a hero and a villain. what are their lightsaber colors? what colors are their clothes? Colors are a powerful way to tell a story in a manner that words alone cannot.

they can set the tone by making it cold and depressing or evoke feelings of anger and danger. This effect can be as striking as the redness of the scene where Omni-Man destroys an entire planet, or as subtle as the moment when Walter White says he is “the one who knocks”.

Colors can also symbolize character traits, such as the darkness associated with the Dark Knight or the alerting yellow worn by The Bride in ‘Kill Bill’.you may want to hear more about colors in storytelling from Lewis Bond.

[Omni-Man explaining nicely why the Flaxans should leave earth alone]

Before we get into the technical aspects of extracting color palettes, let’s first revisit the nature of images and how they represent colors.

Have you ever looked very close to an old TV and saw the individual pixels or subpixels that make up the image? you will notice that it’s not a solid source of light that gives you the color that you would see from afar but actually very small and very close but still separated red, blue and green light sources. -this is how all devices show colors but it would be much harder to see with a high resolution monitor- And if you don’t believe it look at this image. this is not a screenshot of a jpeg or png or any other image format that you are used to. it’s in fact a spreadsheet and every pixel is represented by three colored cells next to each other Screenshot from 2024-02-21 17-01-30 [James McAvoy as Patricia, Dennis, Hedwig, The Beast, Barry, Heinrich, Jade, Ian, Mary Reynolds, Norma, Jalin, Kat, B.T., Kevin Wendell Crumb, Mr. Pritchard, Felida, Luke, Goddard, Samuel, Polly et al. in Glass (2019)]

and if you zoom in you can see the individual colors

you can try it yourself here

this is why in all image formats you will usually find 3 channels stacked over each other and each channel is a 2d array with pixel values ranging from 0 to 255.

[image from Sandeep Balachandran]

now that you are familiar with what images look like let’s have a look at arthur fleck after a long day at work joker [Joaquin Phoenix as arthur fleck in joker(2019)]

open the image and see what the dimensions look like

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

image = mpimg.imread('../arthur.jpg')
h,w,d = tuple(image.shape)
print(h,w,d)

turns out my image has 394 rows(height), 728 columns(width), and 3 channels(depth) what we wanna do right now is reshape this tensor so that we have a pixel at every row and each column is its R value, G value and B value respectively.

pixels = np.reshape(image, (w * h, d))

so this what we essentially have \(\begin{bmatrix} p_1 \\ p_2 \\ \vdots \\ p_{w*h} \end{bmatrix} = \begin{bmatrix} R_{p_1} & G_{p_1} & B_{p_1} \\ R_{p_2} & G_{p_2} & B_{p_2} \\ \vdots & \vdots & \vdots \\ R_{p_{w*h}} & G_{p_{w*h}} & B_{p_{w*h}} \end{bmatrix}\)

to understand the colors of the image. let’s go ahead and plot these pixels. since each pixel \(P^{(i)} \in \mathbb{R}^3\). we are gonna need a 3d plot where each axis represents R , G , B respectively

Upon exploring the plot we notice that there are distinct clusters of colors like orange that belongs to his jacket,teal that belongs to the bus and that shade of gray of the road.

if only we can identify these clusters and maybe take the mean of the of each cluster to get the average cluster color …hey wait that’s K-means clustering! if you are not familiar with k-means or need a refresher, k-means is an unsupervised learning algorithm meaning that we don’t have labels to our input, just like our pixels, generally speaking,

given a set \(S = x^{(1)},x^{(2)}, \ldots , x^{(n)} ; x^{(i)} \in \mathbb{R}^d\)

initialize cluster centroids(means) \(\mu _1,\mu _2, \ldots ,\mu _k \in \mathbb{R}^d\) randomly
then Repeat until convergence:
- for every i, set \(C^{(i)} := argmin \| x^{(i)} -\mu _j \|^2\)
- for every j, set \(\mu _j := \frac{\sum_{i=1}^n 1 \{c^{(i)}=j\} x^{(i)} }{\sum_{i=1}^n 1 \{c^{(i)}=j\}}\)

this is just a fancy way to say let every point to belong to the cluster with the nearest centroid then recalculate the cluster means. you can check statquest for an awesome explanation of the algorithm

So back to the code

All we have to do is fit the pixels to the K-means model, where the number of clusters is the number of colors in the palette that we want. and just like that, their centroids are the colors of the palette for a more visually pleasing palette we will sort it by hue

from sklearn.cluster import KMeans
import colorsys

def RGB2HSL(rgb):
    r, g, b = rgb / 255.0
    h, l, s = colorsys.rgb_to_hls(r, g, b)
    return [h, l, s]
def RGB2HEX(rgb):
    hex_color = "#{:02x}{:02x}{:02x}".format(rgb[0], rgb[1], rgb[2])
    return hex_color

n_colors = 10
model = KMeans(n_clusters=n_colors,random_state=42).fit(pixels)
palette = np.uint8(model.cluster_centers_)
#sort the color palette by hue for a more visually pleasing outcome
palette_hsl = [RGB2HSL(color) for color in palette]
palette_sorted = [color for _, color in sorted(zip([hsl[0] for hsl in palette_hsl], palette), key=lambda x: x[0])]

plt.imshow(image)
plt.show()
plt.imshow([palette_sorted])

for i, color in enumerate(palette):
    plt.text(i, 0, RGB2HEX(color), color='black', ha='center', va='center', fontsize=6)
plt.show()

And finally!

1 / 5

Joaquin Phoenix as Arthur Fleck in Joker (2019)

2 / 5

James McAvoy as The Horde in Glass (2019)

3 / 5

Ana de Armas as Dani Miranda in The Gray Man (2022)

4 / 5

Kara Hayward as Suzy Bishop in Moonrise Kingdom (2012)

5 / 5

Léa Seydoux as Madeleine Swann in No Time to Die (2021)

❮ ❯

so is this the only way?do i have to use machine learning to do this? well no, this is just the fun way, for example you can do it programmatically with a couple of lines of code.

just sort the colors by hue, then take the mean of every 1/k of the sorted pixels.

of course now the computationally expensive part is sorting the whole w*h pixels list

import numpy as np
import matplotlib.pyplot as plt
import colorsys

def RGB2HSL(rgb):
    r, g, b = rgb / 255.0
    h, l, s = colorsys.rgb_to_hls(r, g, b)
    return [h, l, s]

def RGB2HEX(rgb):
    """Convert RGB to hex color code."""
    hex_color = "#{:02x}{:02x}{:02x}".format(rgb[0], rgb[1], rgb[2])
    return hex_color

pixels_hsl = np.array([RGB2HSL(color) for color in pixels])
sorted_indices = np.argsort(pixels_hsl[:, 0])
pixels_sorted = pixels[sorted_indices]

palette = []
step = len(pixels_sorted) // 10
for i in range(9):
    mean_color = pixels_sorted[i * step: (i + 1) * step].mean(axis=0).astype(int)
    palette.append(mean_color)

plt.imshow(image)
plt.show()
plt.imshow([palette])

for i, color in enumerate(palette):
    plt.text(i, 0, RGB2HEX(color), color='black', ha='center', va='center', fontsize=6)
plt.show()

And although this is faster on my machine and probably on yours too, the flaw in this method is that we groups pixels based on their hue component only, not their full color representation. This could lead to less accurate results compared to the K-means clustering approach, which considers all three RGB components.

1 / 5

Joaquin Phoenix as arthur fleck in joker(2019)

2/ 5

James McAvoy as The Horde in Glass

3/ 5

Ana de Armas as Dani Miranda in The Gray Man (2022)

4/ 5

Kara Hayward as Suzy Bishop in Moonrise Kingdom(2012)

5/ 5

Léa Seydoux as Madeleine Swann in No Time to Die (2021)

❮ ❯