Introduction to Medical Image Processing and Analysis — Fundamentals of Digital Images

7 min readMar 22, 2024

How are images formed? What are the core principles that govern image formation? This post is the first of several posts. Through them, I hope to achieve two things: share my notes (I will come back to them during exams) and my thought process about Image processing and Analysis (how I understand the concepts and how anyone with zero knowledge can get acquainted with the field of image processing).

This blog and a few subsequent ones will be an introduction to imaging and image processing and will cover:

Medical Imaging Modalities
Visual perception ( of the human eye)
Digital Images (Formation, Acquisition, Representation)
Processing
Storage
Compression and file formats

Medical Imaging Modalities

Imaging modality is a familiar term that characterizes the method or means through which images are created. The characterization includes the energy source, either sound, electromagnetic radiation, or electronic means, and the kind of sensor used to generate the image. Let’s consider some modalities in the medical imaging field.

MRI — Magnetic Resonance Imaging. With MRI, there is a giant magnet (with a strong magnetic field between 0.2 and 7.0 Tesla**), and the body is positioned in the field. The body tissues are filled with protons, so the magnetic field arranges/ forces these protons to align. In the aligned state, radiowaves are sent through the body from designated coils in the MRI system and these waves de-align the protons. The supply of the waves is ceased and the protons re-align; the changes in the alignment of the protons produce (give off) waves that are collected by a receiver. These waves give information about the location of the protons and the nature of the tissues found within that region.

**Tesla is a unit of measurement for the intensity of a magnetic field. In plain terms the quantity of the strength of the magnetic pull

In Nuclear medicine (e.g. PET — Positron Emission Tomography) usually the patient is injected with a radioactive substance and these substances release gamma radiations when placed within the field of view of the gamma cameras.
X-rays and CT scans — X-ray beams are directed at the body, and depending on the density of the tissues, the particles are either absorbed, attenuated (deflected and their energies reduced), or pass through to get to the film (receiver). The variations in how the particles are ‘treated’ by the tissues create the image.

CT — CT-computed tomography works with the same phenomenon as X-rays only that the images are taken from many angles and processed to produce a 3D instead of a 2D image as in X-rays.

Ultrasound images — a transducer ( a device that transforms energy from one form to the other) ‘shoots’ soundwaves of a certain frequency (20KHz ~ 4GHz) towards a target, and the re-echoing of the waves is collected by a receiver. The changes in the sound and pitches are recorded and processed to form an image

There are more modalities like UV imaging, radar, and electron imaging (using electron microscopy).

Do you ever wonder how your selfies are formed beyond clicking the capture button? Why the X-ray is black and white? Or relatable, how you can see and tell objects apart? How are images formed in the eye? The formation and processing of images on films are similar to human visual perception so it’s worth knowing about the mechanisms of the human eye.

Human Visual Perception

The human eye is an organ with a rich distribution of blood vessels and tissues for vision. To maintain the initial trajectory of this blog, we will discuss a few crucial parts of the human eye.

The eye comprises a lens, fluid mediums ( aqueous and vitreous humor which are meant to maintain pressure and nutrition), retina, and blood vessels. The organ is designed intricately to regulate light, process the images, and transmit them to the brain for interpretation.

The retina is made up of photoreceptors called rods (not sensitive to color) and cones (sensitive to color). When these photoreceptors are stimulated, the process of seeing is initiated.

Basic image formation process on the eye:

Light rays from an object converge on the lens and the lens reflects/’redirects’ (and refracts sometimes) them onto the retina. On the retina, the image is inverted, the photoreceptors transform the light energy into electrical impulses and the brain decodes the impulses for the human to become aware of what the object is. The awareness of the specifics of the object is called perception.

In a camera, the lens has a fixed focal length so focusing at different distances would require adjusting the distance between the lens and the imaging plane ( the film or chip). In the human eye, the distance between the lens and the imaging film (retina) is fixed so focusing on objects at varying distances is achieved by varying the shape of the lens. Fibers in the eye's ciliary body either flatten or thicken the lens to focus on near or far objects.

The Nature of White Light

In his classic white light experiment, Isaac Newton observed the nature of white light through a glass prism. When white light incidents on a glass prism at an angle, a continuous band of colors is formed on the other side; and each band blends smoothly into the other.

The color of an object is the nature of the white light reflected by the object. A white object would reflect a balance of all the wavelengths of light.

A light that is void of color is called monochromatic(achromatic). The only attribute of achromatic light is its intensity, perceived from black to gray and ultimately to white. The range of values of monochromatic light is called the GRAY scale.

Chromatic light spans the entire electromagnetic spectrum(all the colors present within the continuous band). Attributes include brightness, luminance, and radiance.

Brightness Adaptation and Accommodation of the Eye

The amount of light an object is exposed (illumination) to affects how the eye views it. The eye’s ability to differentiate between the intensity of the illumination is crucial in presenting image processing results.

Digital images are displayed as a set of discrete intensity levels of light. The human eye can adapt to these intensities in the order of 1⁰¹⁰ from low-light conditions to very high ones. However, the adaptation process is not simultaneous, meaning an increase in brightness does not result in an instantaneous increase in perceived brightness.

In technical terms, the subjective brightness intensity perceived by the eye is a logarithmic function of light intensity incident on the eye.

Now let’s consider how images are sensed and acquired in cameras.

Image Sensing and Acquisition

Image formation involves 3 main parties, an energy source (could be the sun, x-rays, etc), a body or object under scrutiny that would either transmit or reflect the light and then a sensor (film, retina in the eyes, chip).

Images are acquired by illuminating a body, collecting feedback on how the body reacted to the energy (absorption or reflection), and digitizing the feedback.

The ‘collection’ is done by a sensor, which works based on these principles:

The incoming light energies are transformed into electrical energy (voltages). This is a combination of the input energy’s intensity (how strong the light rays are) and the characteristic response of the sensor material to the energy. Characteristic response of the material means the noticeable changes in the sensor material when the energies ‘touch’ it.
The output voltage is the response of the sensor. This is the digital quantity representing the image.

3 sensor arrangements affect image formation and they are: Single, Linear, and Array sensing.

Single Sensing Elements

An application of such a sensor is the photodiode and they are used in high-precision scanning tasks.

To generate a 2D image using a single sensing element, there has to be a relative displacement (movement in a direction) between the sensor and the object in both the x and y directions. The image below is a single sensing element and works this way:

A film negative** is mounted on a drum. The mechanical rotation of the drum provides displacement in one direction. The drum also has an internal light source. Film negatives have the brightest part of the captured object being dark and the darkest parts being bright. This happens because the film has light-sensitive chemicals that darken upon exposure to light.
The sensor is mounted so that it provides motion perpendicular(at a 90-degree angle) to the drum.
As light passes through the film, its intensity is modulated by the density of the film before being captured at the sensor.
The modulation of the light intensity values results in variations in the light intensity of the image. This means the patterns found in the film are due to the difference in the intensity of the light that finally got to the film.

The disadvantage of single-sensing elements is that they are slow and not easily portable.

This post is meant to be an introduction to image processing particularly the fundamentals of image formation. We discussed some imaging modalities, the human eye and visual perception, image sensing, and how single-sensing element devices work. Be on the lookout for the next post which will talk about the other sensor types, image representation, and storage.

Resources:

Rafael C. Gonzalez — Digital Image Processing (4th Edition)
Image Processing and Analysis Course — University of Cassino