SPO600 SIMD Lab 5

Purpose of this Lab
In this lab, you will investigate the use of SIMD instructions in software, using auto-vectorization, inline assembler, and C intrinsics.

https://wiki.cdot.senecacollege.ca/wiki/SPO600_SIMD_Lab

What is SIMD?

SIMD, acronym for Single Instruction, Multiple Data; “refers to a class of instructions which perform the same operation on several separate pieces of data in parallel” (Tyler).

The purpose of learning SIMD capabilities in this lab is to utilize it in three different ways:

  • Auto-Vectorization: adding compiler options to vectorize loops, automatically generating SIMD code
  • Inline Assembler: adding architecture-specific assembly language embedded in C programs, explicitly including SIMD instructions
  • C Intrinsics: adding function-like extensions, groups of intrincics which provide access to SIMD instructions

Instructions

Part 0: Setup

Unpack the archive to home directory:

[yzhu132@aarchie ~]$ mkdir lab5
[yzhu132@aarchie ~]$ tar -zxvf /public/spo600-simd-lab.tgz -C ~/lab5/
spo600/
spo600/simd_lab/
spo600/simd_lab/vol1.c
spo600/simd_lab/vol_intrinsics.c
spo600/simd_lab/vol_inline.c
spo600/simd_lab/vol.h
spo600/simd_lab/Makefile
spo600/simd_lab/add.c

Part 1: Auto-Vectorization

Modify the Makefile so that this file is compiled with the option -fopt-info-vec-all:

[yzhu132@aarchie ~]$ cd lab5/spo600/simd_lab/
[yzhu132@aarchie simd_lab]$ ls -l
total 24
-rw-r--r--. 1 yzhu132 yzhu132  351 Oct 11 13:09 add.c
-rw-r--r--. 1 yzhu132 yzhu132  393 Oct  3 13:19 Makefile
-rw-------. 1 yzhu132 yzhu132 1007 Oct  2 12:57 vol1.c
-rw-r--r--. 1 yzhu132 yzhu132   24 Oct  2 09:33 vol.h
-rw-r--r--. 1 yzhu132 yzhu132 2225 Oct  2 09:30 vol_inline.c
-rw-r--r--. 1 yzhu132 yzhu132 1577 Oct  2 09:20 vol_intrinsics.c
[yzhu132@aarchie simd_lab]$ nano Makefile 
BINARIES = vol_inline vol_intrinsics add vol1
CCOPTS = -g -O3 
AUTOVECTOROPTS = -fopt-info-vec-all
CC=gcc

all:            ${BINARIES}

vol_inline:     vol_inline.c vol.h
                ${CC} ${CCOPTS} vol_inline.c -o vol_inline

vol_intrinsics: vol_intrinsics.c vol.h
                ${CC} ${CCOPTS} vol_intrinsics.c -o vol_intrinsics

vol1:           vol1.c vol.h
                ${CC} ${CCOPTS} vol1.c -o vol1

add:            add.c
                ${CC} ${CCOPTS} add.c -o add

clean:  
                rm ${BINARIES} || true

auto_vector:    vol1.c vol.h
                ${CC} ${CCOPTS} ${AUTOVECTOROPTS} vol1.c -o vol1

Now compile vol1.c and review the compiler output. By running the following command, it saves the output into a text file:

[yzhu132@aarchie simd_lab]$ make auto_vector &> auto_out.txt
[yzhu132@aarchie simd_lab]$ less auto_out.txt 

Search for lines which has “vectorized” by running this in less:

/vectorized

Found the following blocks of lines:

...
vol1.c:32:2: note: loop vectorized
vol1.c:32:2: note: === vec_transform_loop ===
vol1.c:32:2: note: ------>vectorizing phi: x_52 = PHI <x_35(10), 0(12)>
vol1.c:32:2: note: ------>vectorizing phi: .MEM_56 = PHI <.MEM_34(10), .MEM_31(12)>
vol1.c:32:2: note: ------>vectorizing phi: ivtmp_75 = PHI <ivtmp_74(10), 5000000(12)>
vol1.c:32:2: note: ------>vectorizing statement: # DEBUG x => x_52
vol1.c:32:2: note: ------>vectorizing statement: # DEBUG BEGIN_STMT
...
...
vol1.c:38:2: note: not vectorized: not enough data-refs in basic block.
vol1.c:38:2: note: ===vect_slp_analyze_bb===
vol1.c:38:2: note: ===vect_slp_analyze_bb===
vol1.c:43:2: note: === vect_analyze_data_refs ===
vol1.c:43:2: note: not vectorized: not enough data-refs in basic block.
vol1.c:43:2: note: === vect_analyze_data_refs ===
vol1.c:43:2: note: not vectorized: not enough data-refs in basic block.
...

Looks like only one of the two loops was vectorized, and it’s the loop at line 32.

To vectorize the other loop, I’ll have to remove the modulus operation in line 39 from vol1.c.

[yzhu132@aarchie simd_lab]$ vi vol1.c 

Before:

...
        // Sum up the data
        for (x = 0; x < SAMPLES; x++) {
                ttl = (ttl+data[x])%1000;
        }
...

After:

...
        // Sum up the data
        for (x = 0; x < SAMPLES; x++) {
                ttl = (ttl+data[x]);
        }
...

Time to remove the previous vol1 out file and remake vol1.

[yzhu132@aarchie simd_lab]$ rm vol1  
[yzhu132@aarchie simd_lab]$ make auto_vector &> auto_out_vectorized.txt
[yzhu132@aarchie simd_lab]$ less auto_out_vectorized.txt

And now when searching for “vectorized”, we see that the loop in line 38 is now vectorized:

...
vol1.c:38:2: note: loop vectorized
vol1.c:38:2: note: === vec_transform_loop ===
vol1.c:38:2: note: ------>vectorizing phi: x_52 = PHI <x_36(9), 0(12)>
vol1.c:38:2: note: ------>vectorizing phi: ttl_53 = PHI <ttl_35(9), 0(12)>
vol1.c:38:2: note: multiple-types.
vol1.c:38:2: note: transform phi.
...

Leave a comment