This is the first in a series of articles dedicated to stenography techniques with python.

So what is Steganography?

From the Steganography Wikipedia page:

Steganography is the practice of concealing a message within another message or a physical object. In computing/electronic contexts, a computer file, message, image, or video is concealed within another file, message, image, or video.

If you’re familiar with internet riddles, you know that some image steganography skills is a must-have. You must know how to investigate the hex dump, exif data, play with image levels in Gimp or investigate pixel values.

In this article, we’ll explore some steganography techniques to hide text in images with python. The libraries used are numpy, Pillow and piexif.

Basic and simple steganography techniques

Let’s start with very basic and simple techniques for hiding text in an image file without image pixel/palette processing.

Hiding text in Hex dump

Starting with encoding the secret message in binary and append the result to the image file:

message = "Hello World!" 
with open("cover.jpg", "ab") as f:
	f.write(message.encode('utf8'))

An image with a secret message append to it:

To extract the message:

with open("cover.jpg", "rb") as f:
    for chunk in iter(lambda: f.read(8), b''):
            print(chunk.decode('utf-8', errors="ignore"), end="")

This method does not prevent the image from being displayed normally and doesn’t change the image’s visual appearance.

You can perform this on the command line with a simple echo or catcommand:

$ echo 'Hello World!' >> cover.jpg

$ cat secret.txt >> cover.jpg

To extract the message, you can use strings or even cat commands but let’s try hexdump:

$ hexdump -C cover.jpg | tail -5

Note that the hex trailer for a JPEG is FF D9 (see GCK’S FILE SIGNATURES TABLE). Here hexdump shows clearly there is “something” appended to the image .

The thing is that any kind of file can be appended to an image, without preventing the image to be displayed.
The code for this is as simple as before:


with open("cover.jpg", "ab") as cover, open("secret.txt", "rb") as secret:
	cover.write(secret.read())

But the extraction code is more tricky than before since the hex trailer of the cover file type is needed to delimit the end of the cover file (ff d9 for JPEG, 49 45 4e 44 ae 42 60 82 for PNG, etc.):

trailer = 'ffd9' # trailer for JPEG

# Get trailer offset
with open("cover.jpg", "rb") as cover_secret:
	file = cover_secret.read()
	offset = file.index(bytes.fromhex(trailer))

# Write cover bytes to output file from offset + trailer length
with open("cover.jpg", "rb") as cover_secret, open("secret.txt", "wb") as secret:
	cover_secret.seek(offset + len(trailer)//2)
	secret.write(cover_secret.read())

On the command line, you can append a text file (or whatever type of file you want) to an image with the following trick:
zip the secret file, append the cover image and the archive file in a new image:

$ echo 'Hello World!' > secret.txt
$ zip secret.zip secret.txt
$ cat cover.jpg secret.zip > cover-secret.jpg

To get back the secret file, unzip the archive, it will complains for extra bytes but will process anyway:

$ unzip cover-secret.jpg
Archive:  cover-secret.jpg
warning [cover-secret.jpg]:  30544 extra bytes at beginning or within zipfile
	(attempting to process anyway)
	inflating: secret.txt

Hiding text in metadata

The next method hides the secret message in the image metadata.
The image metadata (date, time, format, camera tags, photo manipulation software tags, etc.) is stored in the Exif format and can be accessed from any photo viewer software.

Let’s use the piexif package to make things easier:

from PIL import Image
import piexif

message = "Hello World!"

im = Image.open("cover.jpg")
if "exif" in im.info:
    exif_dict = piexif.load(im.info["exif"])
    exif_dict["0th"][piexif.ImageIFD.ImageDescription] = message
    exif_bytes = piexif.dump(exif_dict)
else:
    exif_bytes = piexif.dump({"0th":{piexif.ImageIFD.ImageDescription:message}})

im.save("cover-secret.jpg", exif=exif_bytes)

An image with a secret message hiden in Exif:

To extract the message:

from PIL import Image
import piexif

im = Image.open("cover-secret.jpg")
piexif.load(im.info["exif"])["0th"]\
    [piexif.ImageIFD.ImageDescription].decode("utf-8")
Hello World!

The list of possible tags is quite large (see EXIF Tags) and piexif doesn’t manage all of them (piexif supported tags) but enough for you to be creative and hide data in unusual places like GPS coordinates or even image thumbnails…
But Just keep in mind that Exif metadata can be restricted in size (64 kB in JPEG).

To hide and extract your message with exiftool :

$ exiftool -IFD0:ImageDescription='Hello World!' cover-secret.jpg
1 image files updated
$ exiftool -IFD0:ImageDescription cover-secret.jpg * Be aware, that if you upload your image on a platform like medium for example, exif is usually removed. 

Since image metadata or file content are easy to access and easy to scan, these two techniques are not secure at all and it’s a good idea to encrypt your message first.

LSB substitution

Least significant bit (LSB) substitution/overwriting is a simple, albeit common steganography technique where the secret message is hidden by storing information in the least significant bits of the first pixel rows (or columns) of the image.

This how it works:
In the RGB model, each pixel is composed of 3 values (red, gren, blue), that is 8-bit values (from 0 to 255). A binary-valued message can be hidden in an image by replacing a 8-bit value LSB (the last bit) by a message bit.

For example, let’s say we have the 3 following adjacent pixels:

(149, 13, 201)
(150, 15, 202)
(159, 16, 203)

Here is the original pixels:

The 8-bit representation of the pixels is as follow:

10010101   00001101   11001001
10010110   00001111   11001010
10011111   00010000   11001011

To hide 101101101, iterate from left to right, top to bottom and overwrite the last bit of each binary numbers with the respective message bit:

10010101   00001100   11001001
10010111   00001110   11001011
10011111   00010000   11001011

The new pixel values will be:

(149, 12, 201)
(151, 14, 203)
(159, 16, 203)

Here is the modified pixels:

The change in each pixel value is at most one and hence the modified pixels remains indistinguishable from the originals pixel by the human eye.

Following this method, the message is encoded by converting each characters into integers (from 0 to 127 according to the ASCII table) then in binary (8-bit values), then concatenated:

This transformation can be performed with the following python code:

b_message = ''.join(["{:08b}".format(ord(x)) for x in message ])

For the algorithm to implement the LSB substitution, it is as follow:

  • Encode the message in a series of 8-bit values
  • Extract and flatten the pixel array of the cover/host image
  • Replace the last bit of the first pixels by the message bits
  • Reshape back the pixel to an image pixel array
  • Save the new image with the secret

and here is the code:

import numpy as np
from PIL import Image

message = "Hello World!"

# Encode the message in a serie of 8-bit values
b_message = ''.join(["{:08b}".format(ord(x)) for x in message ])
b_message = [int(x) for x in b_message]

b_message_lenght = len(b_message)

# Get the image pixel arrays 
with Image.open("cover.png") as img:
    width, height = img.size
    data = np.array(img)
    
# Flatten the pixel arrays
data = np.reshape(data, width*height*3)

# Overwrite pixel LSB
data[:b_message_lenght] = data[:b_message_lenght] & ~1 | b_message

# Reshape back to an image pixel array
data = np.reshape(data, (height, width, 3))

new_img = Image.fromarray(data)
new_img.save("cover-secret.png")
new_img.show()

The original image:

vs the image with the secret message hidden in pixel LSBs:

  • Actually, the ord function returns the Unicode code of the character. From 0 to 127, ASCII and Unicode tables are the same. Anyway, be sure the characters of your message have a value lower than 256 in the table.
  • This doesn’t work for compressed formats namely JPEG as LSB bits might get tampered during the compression >phase. Use png, gif or bitmap formats that are lossless compression formats.
  • Since numpy array shape is (H, W, D), this algorithm hides the message in the pixel columns. You can use transpose(1,0,2) after reading the data and before creating the image with the secret if you prefer to hide in pixel rows.
  • This code works only for RGB mode. You’ll have to adapt the code to deal with the RGBa mode ignoring the alpha channel (transparency).
  • Don’t use lena.png to hide your message, since this is the cover image usually used for steganography articles, use your own custom photography as cover and don’t publish the cover image on the Internet.

To extract back the message:

  • Extract and flatten the pixel array of the image with secret
  • Extract the last bits of the pixels
  • Pack the elements of the binary-valued array into an integer array
  • Convert each integers into characters until you hit a non printable character

Here is the code:

from PIL import Image
import numpy as np

with Image.open("cover-secret.png") as img:
    width, height = img.size
    data = np.array(img)
    
data = np.reshape(data, width*height*3)

# extract lsb
data = data & 1 

# Packs binary-valued array into 8-bits array.
data = np.packbits(data)

# Read and convert integer to Unicode characters until hitting a non-printable character
for x in data:
    l = chr(x)
    if not l.isprintable():
        break
    print(l, end='')
Hello World!U%UµUj
  • The chr Python2 function returns ASCII extended characters (0 to 255), since Python3, it returns Unicode > characters (from 0 to 0x10FFFF in base 16). From 0 to 127, ASCII and Unicode tables are the same. Anyway, be sure the characters of your message have a value lower than 256 in the table.
  • More than 70% of the unicode characters are printable, so reading until hitting non-printable characters will surely gives extra characters. So better to use some kind of message delimiter or to encode the length of the message with the message.
  • For python libraries, once again the stegano library will do the job for you (see Using Stéganô as a Python module - LSB method). Note that > internally, stegano encode the length of the message and hide and unhide str(len(message)) + ":" + str(message)

Another technique is as follow:
Each letter is encoded in 8 bits, you hide a letter in 3 pixels (hence 9 bits): For the 8th first bits of the pixels, the pixel is changed to odd for 1 and even for 0. If the 9th value is odd, this notifies the end of the message.
Since changing the parity of a pixel is changing the LSB, this is actually a LSB technique as well (See the description of the algorithm and the code here Geeks for Geeks - Image based Steganography using Python)

Only the most naive steganography software would overwrite every least significant bit with hidden data. Almost all the software use some sort of means to randomize the actual bits or include an encryption option which can make steganography detection difficult. See An Overview of Steganography for the Computer Forensics Examiner for more information on LSB techniques and the references at the end of this article for a list of steganography software.

Going deeper with LSB substitution

Since we can easily hide binary data with the LSB technique, the obvious next level is hiding black and white image with text in a cover image.

Representing an image has a series of 0s and 1s (white and black pixels respectively), the previous LSB substitution technique can be used to hide the secret image in an other image.

For example, the QR code encoding “Hello World!”:

will be hidden in

Using a gray-scale image has a cover/host image allows to work with one single channel and to have a one to one mapping between the cover and the secret image pixels (secret and cover images must have the same size) to simplify the encoding/decoding with the LSB algorithm.

Using the 1 (1-bit pixels, black and white, stored with one pixel per byte) and L (8-bit pixels, black and white) Pillow modes, the code to hide the QR code in the cover image is as simple as the previous code to hide text:

from PIL import Image
import numpy as np

# Convert cover image to gray-scale
cover = Image.open("cover.png").convert('L')

data_c = np.array(cover)

# Convert image to 1-bit pixel, black and white and resize to cover image
secret = Image.open("qr-secret.png").convert('1')
secret = secret.resize(cover.size)

data_s = np.array(secret, dtype=np.uint8)

# Rewrite LSB
res = data_c & ~1 | data_s

new_img = Image.fromarray(res).convert("L")
new_img.save("cover-secret.png")
new_img.show()

An image with a secret image (QR) hidden in pixel LSBs:

To extract the secret image:

from PIL import Image
import numpy as np

secret = Image.open("cover-secret.png")
data_s = np.array(secret)
data_s = data_s & 1

new_img = Image.fromarray(data_s * np.uint(255))
new_img.show()

Now, the hidden image is quite easy to detect if you apply some basic steganalaysis techniques to the image, like bitplanes extraction for example (Bit planes). A bitplane is a set of bits corresponding to a given bit position in each of the binary numbers representing the image.
A gray-scale image has 8 bit planes (eight bits per pixel): the first plane for the most significant bit, the 8th for the least significant bit. The higher the number of the bit plane, the less is its contribution to the final image.

Here is the function to extract bitplanes from a gray-scale image:

from PIL import Image
import numpy as np

def bitplanes(im):
    im = Image.open(im).convert("L")
    data = np.array(im)
    out = []
    # create an image for each k bit plane
    for k in range(7,-1,-1):
    # extract kth bit (from 0 to 7)
        res = data // 2**k & 1
        out.append(res*255)
    # stack generated images
    b = np.hstack(out)
    return Image.fromarray(b)

bitplanes("cover-secret.png").show()

The bitplanes function reveals the QR code hidden in the last plane.

Let’s make the hidden image less obvious by adding a simple layer of randomness. A simple technique is to xor the secret image bits with some of the bits of the cover image. For a gray-scale image, this would be the 2nd LSB.

Let’s xor the message bits with the cover image 2nd LSB :

from PIL import Image
import numpy as np

cover = Image.open("cover.png")
data_c = np.array(cover)

# Convert image to full black and white and resize to cover image
secret = Image.open("qr-secret.png").convert('1')
secret = secret.resize(cover.size)
data_s = np.array(secret, dtype=np.uint8)

# Extract 2nd LSB (7th bit from left) from cover image
# The binary operation performed is : pixel >> 8-k & 1 (with k the kth bit to extract starting from 1)
# same as pixel // 2**(8-k) & 1
bit = np.bitwise_and(np.right_shift(data_c, 1), 1)

# Rewrite cover LSB with bit from secret xored with LSBs
res = data_c & ~1 | bit ^ data_s

new_img = Image.fromarray(res).convert("L")
new_img.save("cover-secret.png")
new_img.show()

Let’s check again the bitplanes:

bitplanes("cover-secret.png").show()

To extract the hidden image, invert the xor operation (the opposite of the xor operation is xor) and extract the last bit:

from PIL import Image
import numpy as np

secret = Image.open("cover-secret.png")
data_s = np.array(secret)

bit = np.bitwise_and(np.right_shift(data_s, 1), 1)
data_s = bit ^ data_s & 1

new_img = Image.fromarray(data_s * np.uint(255))
new_img.show()

The next article will be dedicated to steganalysis with python.

References:

Steganography
An Overview of Steganography for the Computer Forensics Examiner
Cheatsheet - Steganography 101
EXIF Tags
piexif supported tags
GCK’S FILE SIGNATURES TABLE

Tools:

Stegano, a python library implementing various LSB technics
Steganography Toolkit, a (big) docker image with various steganography tools
Jeffrey’s Image Metadata Viewer, a nice online tool to display image metadata
Exiftool, a command line tool to view and manipulate image’s metadata.
Steganography tools, a wiki page on Steganography tools