Purpose of this Lab
https://wiki.cdot.senecacollege.ca/wiki/SPO600_SIMD_Lab
In this lab, you will investigate the use of SIMD instructions in software, using auto-vectorization, inline assembler, and C intrinsics.
What is SIMD?
SIMD, acronym for Single Instruction, Multiple Data; “refers to a class of instructions which perform the same operation on several separate pieces of data in parallel” (Tyler).
The purpose of learning SIMD capabilities in this lab is to utilize it in three different ways:
- Auto-Vectorization: adding compiler options to vectorize loops, automatically generating SIMD code
- Inline Assembler: adding architecture-specific assembly language embedded in C programs, explicitly including SIMD instructions
- C Intrinsics: adding function-like extensions, groups of intrincics which provide access to SIMD instructions
Instructions
Part 0: Setup
Unpack the archive to home directory:
[yzhu132@aarchie ~]$ mkdir lab5
[yzhu132@aarchie ~]$ tar -zxvf /public/spo600-simd-lab.tgz -C ~/lab5/
spo600/
spo600/simd_lab/
spo600/simd_lab/vol1.c
spo600/simd_lab/vol_intrinsics.c
spo600/simd_lab/vol_inline.c
spo600/simd_lab/vol.h
spo600/simd_lab/Makefile
spo600/simd_lab/add.c
Part 1: Auto-Vectorization
Modify the Makefile so that this file is compiled with the option -fopt-info-vec-all:
[yzhu132@aarchie ~]$ cd lab5/spo600/simd_lab/
[yzhu132@aarchie simd_lab]$ ls -l
total 24
-rw-r--r--. 1 yzhu132 yzhu132 351 Oct 11 13:09 add.c
-rw-r--r--. 1 yzhu132 yzhu132 393 Oct 3 13:19 Makefile
-rw-------. 1 yzhu132 yzhu132 1007 Oct 2 12:57 vol1.c
-rw-r--r--. 1 yzhu132 yzhu132 24 Oct 2 09:33 vol.h
-rw-r--r--. 1 yzhu132 yzhu132 2225 Oct 2 09:30 vol_inline.c
-rw-r--r--. 1 yzhu132 yzhu132 1577 Oct 2 09:20 vol_intrinsics.c
[yzhu132@aarchie simd_lab]$ nano Makefile
BINARIES = vol_inline vol_intrinsics add vol1
CCOPTS = -g -O3
AUTOVECTOROPTS = -fopt-info-vec-all
CC=gcc
all: ${BINARIES}
vol_inline: vol_inline.c vol.h
${CC} ${CCOPTS} vol_inline.c -o vol_inline
vol_intrinsics: vol_intrinsics.c vol.h
${CC} ${CCOPTS} vol_intrinsics.c -o vol_intrinsics
vol1: vol1.c vol.h
${CC} ${CCOPTS} vol1.c -o vol1
add: add.c
${CC} ${CCOPTS} add.c -o add
clean:
rm ${BINARIES} || true
auto_vector: vol1.c vol.h
${CC} ${CCOPTS} ${AUTOVECTOROPTS} vol1.c -o vol1
Now compile vol1.c and review the compiler output. By running the following command, it saves the output into a text file:
[yzhu132@aarchie simd_lab]$ make auto_vector &> auto_out.txt
[yzhu132@aarchie simd_lab]$ less auto_out.txt
Search for lines which has “vectorized” by running this in less:
/vectorized
Found the following blocks of lines:
...
vol1.c:32:2: note: loop vectorized
vol1.c:32:2: note: === vec_transform_loop ===
vol1.c:32:2: note: ------>vectorizing phi: x_52 = PHI <x_35(10), 0(12)>
vol1.c:32:2: note: ------>vectorizing phi: .MEM_56 = PHI <.MEM_34(10), .MEM_31(12)>
vol1.c:32:2: note: ------>vectorizing phi: ivtmp_75 = PHI <ivtmp_74(10), 5000000(12)>
vol1.c:32:2: note: ------>vectorizing statement: # DEBUG x => x_52
vol1.c:32:2: note: ------>vectorizing statement: # DEBUG BEGIN_STMT
...
...
vol1.c:38:2: note: not vectorized: not enough data-refs in basic block.
vol1.c:38:2: note: ===vect_slp_analyze_bb===
vol1.c:38:2: note: ===vect_slp_analyze_bb===
vol1.c:43:2: note: === vect_analyze_data_refs ===
vol1.c:43:2: note: not vectorized: not enough data-refs in basic block.
vol1.c:43:2: note: === vect_analyze_data_refs ===
vol1.c:43:2: note: not vectorized: not enough data-refs in basic block.
...
Looks like only one of the two loops was vectorized, and it’s the loop at line 32.
To vectorize the other loop, I’ll have to remove the modulus operation in line 39 from vol1.c.
[yzhu132@aarchie simd_lab]$ vi vol1.c
Before:
...
// Sum up the data
for (x = 0; x < SAMPLES; x++) {
ttl = (ttl+data[x])%1000;
}
...
After:
...
// Sum up the data
for (x = 0; x < SAMPLES; x++) {
ttl = (ttl+data[x]);
}
...
Time to remove the previous vol1 out file and remake vol1.
[yzhu132@aarchie simd_lab]$ rm vol1
[yzhu132@aarchie simd_lab]$ make auto_vector &> auto_out_vectorized.txt
[yzhu132@aarchie simd_lab]$ less auto_out_vectorized.txt
And now when searching for “vectorized”, we see that the loop in line 38 is now vectorized:
...
vol1.c:38:2: note: loop vectorized
vol1.c:38:2: note: === vec_transform_loop ===
vol1.c:38:2: note: ------>vectorizing phi: x_52 = PHI <x_36(9), 0(12)>
vol1.c:38:2: note: ------>vectorizing phi: ttl_53 = PHI <ttl_35(9), 0(12)>
vol1.c:38:2: note: multiple-types.
vol1.c:38:2: note: transform phi.
...