Copy Link
Add to Bookmark
Report

Using Advanced 3D Texturing Hardware to Convert Planar YUV to RGB

Dreamcast's profile picture
Published in 
Dreamcast
 · 14 Nov 2020

Using Advanced 3D Texturing Hardware to Convert Planar YUV to RGB
by Jason Dorie (jdorie at ea.com) and Mike Melanson (mike at multimedia.cx)
version 1.0: August 20, 2003

Abstract

It is possible to use the texturing and blending capabilities of modern 3D graphics hardware to render planar YUV video data, even when the hardware does not natively support planar YUV formats.

Contents

  • Introduction
  • Conversion Technique
  • Conversion Examples
  • Outranging
  • Sega Dreamcast Implementation
  • References
  • Changelog

Introduction

Many modern video codecs encode data from, and decode data to, a planar YUV colorspace. There are many modern video chipsets that can directly display planar YUV data. Other video chipsets do not directly allow this, or the details to perform this type of output are unknown. Traditionally, it has been necessary to convert this video data to a RGB colorspace (or a packed YUV colorspace if the hardware supports it) before displaying.

However, many modern video chipsets are very advanced in the area of rendering textures onto 3D polygons. This capability can be leveraged to render planar YUV video data without converting to a different format which incurs an additional playback penalty.

This document assumes a familiarity with RGB and YUV colorspaces. For more information, see the references at the end of this document.

Conversion Technique

This discussion uses these basic YUV -> RGB conversion formulas:

 R = Y + C1 * (V - 128) 
G = Y - C2 * (V - 128) - C3 * (U - 128)
B = Y + C4 * (U - 128)

where C1..C4 are constant values that are open to some interpretation depending on which source you read. In this example, the constants will have these values (thought the source code examples may use different values):

 C1 = 1.403 
C2 = 0.344
C3 = 0.714
C4 = 1.770

The general concept is to load the Y, U, and V planes into texture memory and blend them as paletted textures. By using each Y, U, or V sample as a palette index, computations such as C1 * (V - 128) are implicitly carried out as a lookup table. The simple addition and subtraction arithmetic are performed by blending the textures. Further, all 3 planes can be scaled by the hardware to fit the entire screen before the blending takes place, sometimes with hardware bilinear filtering beforehand.

A more precise flow of actions is:

  • decode Y, U, and V planes and transfer them into texture memory
  • render the Y plane using palette 0 (base Y values)
  • blend the U plane using palette 1 (additive U values)
  • blend the U plane using palette 2 (subtractive U values)
  • blend the V plane using palette 3 (additive V values)
  • blend the V plane using palette 4 (subtractive V values)

Separate additive and subtractive passes are necessary for the U and V planes due to the fact that blending hardware typically performs unsigned arithmetic and saturates (clamps) values at 0 and 255.

Palette 0 contains RGB triplets in which all 3 components are the same as the Y value. For example, a Y value of 249 corresponds to a RGB palette 0 entry of (249, 249, 249).

Palette 1 contains RGB triplets with values that modify the base Y values with respect to the U components, where the result of the U calculations is positive. For the B calculation, C4 * (U - 128) will be positive when U > 128. For the G component, -C3 * (U - 128) will be positive when U < 128. For example, in the case U = 28, the U component of the B calculation is negative so it will not affect this pass. The U component of the G calculation is:

 -0.714 * (28 - 128) = -0.714 * -100 = 71

The R calculation is unaffected by this pass. Palette 1, entry 28 will contain the triplet (0, 71, 0).

Palette 2 contains the subtractive pass for the U plane values. It is similar to palette 2, except that the B calculation will be defined for all U < 128 and the G calculation will be defined for all U > 128.

Palette 3 contains the additive pass for the V plane values. These are computed similar to the values in palette 1 except that the R calculation is modified while the B calculation is not.

Palette 4 contains the subtractive pass for the V plane values. These are computed similar to the values in palette 2 except that the R calculation is modified while the B calculation is not.

Conversion Examples

As an example, consider the YUV triplet (255, 128, 128). This represents bright white which is represented by RGB components at or close to their maximum value (255). According to the formulas presented above:

 R = 255 + 1.403 * (128 - 128) = 255 
G = 255 - 0.344 * (128 - 128) - 0.714 * (128 - 128) = 255
B = 255 + 1.770 + (128 - 128) = 255

According to the palette definitions presented above:

 Y = 255, palette 0 entry 255 = (255, 255, 255) 
U = 128, palette 1 entry 128 = ( 0, 0, 0) +
U = 128, palette 2 entry 128 = ( 0, 0, 0) -
V = 128, palette 3 entry 128 = ( 0, 0, 0) +
V = 128, palette 4 entry 128 = ( 0, 0, 0) -
-------------------
final RGB triplet = (255, 255, 255)

As a slightly more interesting example, consider the YUV triplet for a shade of green with some blue mixed in, (129, 91, 24):

 R = 129 + 1.403 * (24 - 128) = 0 (saturated) 
G = 129 - 0.344 * (24 - 128) - 0.714 * (91 - 128) = 191
B = 129 + 1.770 * (91 - 128) = 64
Y = 129, palette 0 entry 129 = (129, 129, 129)
U = 91, palette 1 entry 91 = ( 0, 26, 0) +
U = 91, palette 2 entry 91 = ( 0, 0, 65) -
V = 24, palette 3 entry 24 = ( 0, 36, 0) +
V = 24, palette 4 entry 24 = (146, 0, 0) -
-------------------
final RGB triplet = ( 0, 191, 64)

Outranging

There is some theoretical inaccuracy possible if the samples run close to 255 or zero for any single component, as hardware will clamp color addition or subtraction. For example, an additive U pass of 30 added to a base Y value of 255 will still be 255 with hardware saturation. A subsequent subtractive pass with a value of 50 will bring the final color sample down to 205, rather than 235 which is what the sample would be without saturation. In practice, this does not seem to produce obvious artifacts.

Example: Sega Dreamcast

The Sega Dreamcast video game console has a PowerVR graphics chip that renders 3D objects and paints textures on them. The textures can be specified in a variety of formats such as RGB565, ARGB1555, ARGB4444, YUY2 (packed YUV 4:2:2), even a vector quantized format for tile compression. Textures can also be in 4- or 8-bit palettized formats. The PVR hardware has a 1024-entry table for RGB colors which gives 64 possible 16-color palettes or 4 possible 256-color palettes.

The limitation of 4 256-color palettes immediately poses a problem. Optimally, the conversion approach requires 5 palettes. A hack around this problem is to use the available 4 palettes for the base Y plane, the additive and subtractive U planes, and the additive V plane. Then, render the subtractive V plane onto a RGB565 texture using a pre-calculated table.

Another challenge in using this approach on the Dreamcast's PVR is that palettized textures are required to be "twiddled" (sometimes referred to as "swizzling"). This is the process of rearranging the texture samples in a special order for optimizing memory access for operations such as bilinear filtering. This means that the Y, U and V planes have to be twiddled before blending. Twiddling the subtractive V texture is optional as RGB565 do not have to be twiddled.

A useful optimization would be to not convert all the planes at once after a YUV image is completely decoded. Instead, after each slice is finished (where a slice in MPEG-like data tends to be an image row 16 pixels in height), twiddle that slice's data into the 3 twiddled palettized textures and the subtractive Y texture. This will help to make the most of the Dreamcast's CPU's 16 kilobyte data cache.

Of course, the Dreamcast also has DMA facilities to transfer data from system RAM to video RAM. When the decoding and twiddling is all done, the DMA can be shuttling the frame data to the video RAM while the CPU is decoding and twiddling the next frame.

See the source file at:

http://www.multimedia.cx/dc-yuv2rgb.c

for a simple, somewhat unoptimized, proof of concept of the conversion technique.

References

Changelog

version 1.0: August 20, 2003
- initial release

dc-yuv2rgb.c

 /* 
* dc-yuv2rgb.c
* by Mike Melanson (mike at multimedia.cx)
* No license (or warranty). Use this program however you see fit.
*
* This program demonstrates how to use 3D texturing/blending hardware to
* convert YUV data to RGB as outlined in this paper:
*
* http://www.multimedia.cx/yuv-3d-rgb.txt
*
* The program is written for the Sega Dreamcast video game console
* with its PowerVR graphics chip. It is designed to be
* run with a standard KOS development setup available from:
*
* http://cadcdev.sourceforge.net/
*
* It was developed with KOS v1.2.0 but probably operates fine under
* earlier versions as well.
*
* Function: This program simply uses the YUV -> RGB conversion technique
* to display a 512x256-pixel, quad-color texture on the screen until the
* start button is pressed on the Dreamcast controller.
*/


#include <stdio.h>
#include <kos.h>

static unsigned short SubVTable[256];

void init_palettes(void) {

int i;
unsigned char r, g, b;

pvr_set_pal_format(PVR_PAL_ARGB8888);

/* palette #0 (entries 0-255) is the base Y values */
for (i = 0; i < 256; i++)
pvr_set_pal_entry(i, 0xFF000000 | (i << 16) | (i << 8) | (i << 0));

/* palette #1 (entries 256-511) are the additive U values */
/* red will not be altered by this pass */
r = 0;
for (i = 0; i < 256; i++) {
/* only U values < 128 will affect the green plane on this pass */
if (i < 128)
g = (128 - i) * 0.338;
else
g = 0;

/* only U values > 128 will affect the blue plane on this pass */
if (i > 128)
b = (i - 128) * 1.732;
else
b = 0;

pvr_set_pal_entry(i + 256, 0xFF000000 | (r << 16) | (g << 8) | (b << 0));
}

/* palette #2 (entries 512-767) is the subtractive U values */
/* red will not be altered by this pass */
r = 0;
for (i = 0; i < 256; i++) {
/* only U values > 128 will affect the green plane on this pass */
if (i > 128)
g = (i - 128) * 0.338;
else
g = 0;

/* only U values < 128 will affect the blue plane on this pass */
if (i < 128)
b = (128 - i) * 1.732;
else
b = 0;

pvr_set_pal_entry(i + 512, 0xFF000000 | (r << 16) | (g << 8) | (b << 0));
}

/* palette #3 (entries 768-1023) is the additive V values */
/* blue will not be altered by this pass */
b = 0;
for (i = 0; i < 256; i++) {
/* only V values < 128 will affect the green plane on this pass */
if (i < 128)
g = (128 - i) * 0.698;
else
g = 0;

/* only V values > 128 will affect the blue plane on this pass */
if (i > 128)
r = (i - 128) * 1.370;
else
r = 0;

pvr_set_pal_entry(i + 768, 0xFF000000 | (r << 16) | (g << 8) | (b << 0));
}

/* palette #4 (not an official palette) is the subtractive V values */
/* blue will not be altered by this pass */
b = 0;
for (i = 0; i < 256; i++) {
/* only V values > 128 will affect the green plane on this pass */
if (i > 128)
g = (i - 128) * 0.698;
else
g = 0;

/* only V values < 128 will affect the blue plane on this pass */
if (i < 128)
r = (128 - i) * 1.370;
else
r = 0;

/* top 5 bits of r, top 6 bits of green, b will always be 0 */
SubVTable[i] = ((r >> 3) << 11) | ((g >> 2) << 5);
}
}

void submit_tr_poly(int x1, int y1, int x2, int y2, int width, int height,
int palette, pvr_ptr_t texture_ptr, int texture_format, int poly_type,
int pass_type) {

pvr_poly_cxt_t cxt;
pvr_poly_hdr_t hdr;
pvr_vertex_t vert;

pvr_poly_cxt_txr(&cxt, poly_type, texture_format, width, height,
texture_ptr, PVR_FILTER_BILINEAR);
cxt.txr.format |= PVR_TXRFMT_8BPP_PAL(palette);
if (poly_type != PVR_LIST_OP_POLY) {
if (pass_type == 0) {
cxt.blend.src = PVR_BLEND_ONE;
cxt.blend.dst = PVR_BLEND_ONE;
} else {
cxt.blend.src = PVR_BLEND_INVDESTCOLOR;
cxt.blend.dst = PVR_BLEND_INVDESTCOLOR;
}
}
pvr_poly_compile(&hdr, &cxt);
pvr_prim(&hdr, sizeof(hdr));

vert.argb = PVR_PACK_COLOR(1.0f, 1.0f, 1.0f, 1.0f);
vert.oargb = 0;
vert.flags = PVR_CMD_VERTEX;

vert.x = x1;
vert.y = y1;
vert.z = 1;
vert.u = 0.0;
vert.v = 0.0;
pvr_prim(&vert, sizeof(vert));

vert.x = x2 - 1;
vert.y = y1;
vert.z = 1;
vert.u = 1.0;
vert.v = 0.0;
pvr_prim(&vert, sizeof(vert));

vert.x = x1;
vert.y = y2 - 1;
vert.z = 1;
vert.u = 0.0;
vert.v = 1.0;
pvr_prim(&vert, sizeof(vert));

vert.x = x2 - 1;
vert.y = y2 - 1;
vert.z = 1;
vert.u = 1.0;
vert.v = 1.0;
vert.flags = PVR_CMD_VERTEX_EOL;
pvr_prim(&vert, sizeof(vert));
}

#define WIDTH 512
#define HEIGHT 256

int main() {

unsigned char y_plane[WIDTH * HEIGHT];
unsigned char u_plane[WIDTH * HEIGHT / 4];
unsigned char v_plane[WIDTH * HEIGHT / 4];
unsigned short sub_v_plane[WIDTH * HEIGHT / 4];

pvr_ptr_t y_texture;
pvr_ptr_t u_texture;
pvr_ptr_t v_texture;
pvr_ptr_t sub_v_texture;

int i, x, y;
cont_cond_t cont;

vid_set_mode(DM_640x480_NTSC_IL, PM_RGB565);

/* init pvr subsystem */
pvr_init_defaults();

/* fill up the palettes */
init_palettes();

/* allocate the textures */
y_texture = pvr_mem_malloc(WIDTH * HEIGHT);
u_texture = pvr_mem_malloc(WIDTH * HEIGHT / 4);
v_texture = pvr_mem_malloc(WIDTH * HEIGHT / 4);
sub_v_texture = pvr_mem_malloc(WIDTH * HEIGHT / 2);

if (!y_texture || !u_texture || !v_texture || !sub_v_texture) {
printf ("*** could not allocate textures\n");
return 0;
}

/* contrive a quad-colored YUV texture:
* white red
* green blue
*
* white = (255, 128, 128)
* red = (65, 90, 240)
* green = (129, 91, 24)
* blue = (25, 240, 110)
*/


for (y = 0; y < HEIGHT / 2; y++) {
for (x = 0; x < WIDTH / 2; x++) {
y_plane[y * WIDTH + x] = 0xFF;
y_plane[y * WIDTH + (WIDTH / 2) + x] = 65;
y_plane[(y + (HEIGHT / 2)) * WIDTH + x] = 129;
y_plane[(y + (HEIGHT / 2)) * WIDTH + (WIDTH / 2) + x] = 25;

}
}

for (y = 0; y < HEIGHT / 4; y++) {
for (x = 0; x < WIDTH / 4; x++) {
u_plane[y * (WIDTH / 2) + x] = 0x80;
u_plane[y * (WIDTH / 2) + (WIDTH / 4) + x] = 90;
u_plane[(y + (HEIGHT / 4)) * (WIDTH / 2) + x] = 91;
u_plane[(y + (HEIGHT / 4)) * (WIDTH / 2) + (WIDTH / 4) + x] = 240;

v_plane[y * (WIDTH / 2) + x] = 0x80;
v_plane[y * (WIDTH / 2) + (WIDTH / 4) + x] = 240;
v_plane[(y + (HEIGHT / 4)) * (WIDTH / 2) + x] = 24;
v_plane[(y + (HEIGHT / 4)) * (WIDTH / 2) + (WIDTH / 4) + x] = 110;
}
}

/* build the subtractive V plane */
for (i = 0; i < (WIDTH * HEIGHT) / 4; i++) {
sub_v_plane[i] = SubVTable[v_plane[i]];
}

/* twiddle and transfer the planes into texture memory */
pvr_txr_load_ex(y_plane, y_texture, WIDTH, HEIGHT, PVR_TXRLOAD_8BPP);
pvr_txr_load_ex(u_plane, u_texture, WIDTH / 2, HEIGHT / 2, PVR_TXRLOAD_8BPP);
pvr_txr_load_ex(v_plane, v_texture, WIDTH / 2, HEIGHT / 2, PVR_TXRLOAD_8BPP);
pvr_txr_load_ex(sub_v_plane, sub_v_texture, WIDTH / 2, HEIGHT / 2, PVR_TXRLOAD_16BPP);

/* do the sequence twice because of the double-buffering */
for (i = 0; i < 2; i++) {

/* plot the textures and do the math */

/* prep the PVR hardware */
pvr_wait_ready();
pvr_scene_begin();

/* base Y plane */
pvr_list_begin(PVR_LIST_OP_POLY);
submit_tr_poly(0, 0, WIDTH, HEIGHT, WIDTH, HEIGHT, 0, y_texture,
PVR_TXRFMT_PAL8BPP, PVR_LIST_OP_POLY, 0);
pvr_list_finish();

pvr_list_begin(PVR_LIST_TR_POLY);

/* additive U pass */
submit_tr_poly(0, 0, WIDTH, HEIGHT, WIDTH / 2, HEIGHT / 2, 1, u_texture,
PVR_TXRFMT_PAL8BPP, PVR_LIST_TR_POLY, 0);

/* subtractive U pass */
submit_tr_poly(0, 0, WIDTH, HEIGHT, WIDTH / 2, HEIGHT / 2, 2, u_texture,
PVR_TXRFMT_PAL8BPP, PVR_LIST_TR_POLY, 1);

/* additive V pass */
submit_tr_poly(0, 0, WIDTH, HEIGHT, WIDTH / 2, HEIGHT / 2, 3, v_texture,
PVR_TXRFMT_PAL8BPP, PVR_LIST_TR_POLY, 0);

/* subtractive V pass with 16-bit texture */
submit_tr_poly(0, 0, WIDTH, HEIGHT, WIDTH / 2, HEIGHT / 2, 0, sub_v_texture,
PVR_TXRFMT_RGB565, PVR_LIST_TR_POLY, 1);

pvr_list_finish();
pvr_scene_finish();
}

printf (" press start to exit...\n");
do {
if (cont_get_cond(maple_first_controller(), &cont))
printf ("Error getting controller status\n");
cont.buttons = ~cont.buttons;
vid_waitvbl();
} while (!(cont.buttons & CONT_START));

return 0;
}

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT