Posts

Showing posts from April, 2018

SPO 600 Project - Stage 3

In this post, I will discuss my findings during Stage 3 of the project. For my previous stages, click here and here . Before I start, I think I should mention how to performance improved in percentage. And there's the result: [GeoffWu@aarchie tester]$ ./tester Run #1 original version took 33339704 nanoseconds modified version took 33237179 nanoseconds hash match: y difference between two versions: 102525 nanoseconds percentage difference: 0.307516 Run #2 original version took 33302201 nanoseconds modified version took 33231595 nanoseconds hash match: y difference between two versions: 70606 nanoseconds percentage difference: 0.212016 Run #3 original version took 33366103 nanoseconds modified version took 33270939 nanoseconds hash match: y difference between two versions: 95164 nanoseconds percentage difference: 0.285212 Run #4 original version took 33349119 nanoseconds modified version took 33346432 nanoseconds hash match: y difference between two version

SPO 600 - Lab 6

In this lab, I would be exploring the use of inline assembler and its use in open source software. For more information, click here . Part A For comparison, I tested the performance of vol_simd (the version with inline assembly and SIMD) and vol3 (the version with no inline assembly, which I built in lab 5) on aarch64 (well, the inline assembly version is written in aarch64 assembly), and I noticed vol_simd is a bit faster than its pure-C counterparts. Inline assembly and SIMD: [GeoffWu@aarchie spo600_20181_inline_assembler_lab]$ time ./vol_simd Generating sample data. Scaling samples. Summing samples. Result: -454 Time spent: 0.000670 real    0m0.027s user    0m0.027s sys    0m0.000s Pure C: [GeoffWu@aarchie spo600_20181_vol_skel]$ time ./vol1 Result: -142 Time spent: 0.001029 real    0m0.028s user    0m0.028s sys    0m0.000s I adjust the number of samples to 500000000, and vol_simd is still faster. Inline assembly and SIMD: [GeoffWu@aarchie spo600_20181

SPO 600 Project - Stage 2

In this post, I will discuss my findings during Stage 2 of the project. For more details, click here . Before I should talk about how I optimized the Murmurhash2 algorithm, I would like to answer a question that I forgot to add on my stage 1 post - where is that algorithm is used ? After doing some research, I found out that Tengine, just like nginx, uses Murmurhash2 in the ngx_http_split_clients_module , which is often used for A/B Testing . Alright then, let's talk about how I optimized the algorithm. First of all, I think I should mention the optimization strategies I eliminated: In-line assembler : Given Tengine is meant to support a variety of platforms and CPU architectures, I honesty don't think it is a good option to use it - unless Jack Ma would give me a job afterwards :) Altered build options : At first I considered this option, since it is the easiest one, and planning to change its compile options to -O3/-Ofast, but after checking the cc (which listed all