DEV
BLOG
  • Design
  • Data
  • Discernment

We believe in AI and every day we innovate to make it better than yesterday. We believe in helping others to benefit from the wonders of AI and also in extending a hand to guide them to step their journey to adapt with future.

Know more

Our solutions in action for customers

DOWNLOAD

Featured Post

How we spoke to data in 2 days

What would you do in two days? Let me be more precise, what would you do on a weekend? Depending on the kind of person you are, the answers may differ. Some may wanna stay in, have a good sleep, take it slow. If you are like me, you would be on the road riding […]

Know More

MENU

  • HOME
  • SERVICES
  • PRODUCTS
  • CASE STUDIES
  • INDUSTRIES
  • CAREERS
  • CONTACT US

Artificial Intelligence

Blockchain

Enterprise Solutions

Blog
White Papers
Resources
Videos
News

How to build super scalable & optimized systems with assembly language: A case study driven analysis

  • mm
    by Goutham Krishna on Mon Aug 3

In this article, we’ll see how to build super scalable and optimized system with assembly language with the help of a case study. I’ve always seen programming as a two-step process. The first step is to successfully compile the code you wrote. No bugs, no errors, and everything should work like a charm. And the second step is even more important. If you want the program you wrote to be scalable, you need to optimize it, to make the execution time and resource consumption as low as possible. Today, we have a wide variety of programming languages. Each of those has its own merits and demerits.

Should we start programming in assembly language?

I started my career with JavaScript. Node.js to be precise and I had some C programming experience in college. Although JavaScript is still widely used for programming, I personally feel that it shouldn’t be the language one should learn as a beginner. Scripting languages let you code in any way you want since there is no structure. You can also declare variables on the fly, even without knowing their types. All these features are convenient when you are a newbie programmer, but once you start building high performing, scalable architectures, these languages are not upto the mark. The next two languages that I learned were two static typed powerful languages called Go and Rust. They were both awesome languages that reduced the majority of my runtime errors and gave me a different perspective on programming. I was experimenting with these programming languages to see how far I can optimize a program. Despite trying out several programming languages, one question still prevailed on my mind. Can I optimize my code further?

A couple of months ago, I developed an interest in retro computing. Two projects that really inspired me were the Apollo guidance computer (computer used in moon landing) and Nintendo NES. The average smartphone is a million times better compared to the specs that helped humanity to land on the moon. The fact that the memory of an NES could not load a single jpeg file with a single screen of Super Mario was shocking to me. The amount of innovative coding that the NES has is worth appreciating. Both of these programs were written in assembly language, and I wondered if it was still worth trying out executing my program in assembly language.

The Experiment

The low-level language I knew was C++, which is unbeatable when it comes to performance. So, I thought of using C++ and optimizing some parts with assembly and comparing their processing time. While benchmarking something, I like to use the Fibonacci series. This is primarily because it covers iteration or recursions and will be able to cover both benchmarks. I wanted to solve a new age problem rather than a classic algorithm or known algorithm.

After conducting a little research for my assembly program, I found a tutorial in which the user manipulated an image. I found this extremely intriguing, so I tried to pull off the same thing with my own unique twist.

The project’s input is the path of an image and brightness factor and the output would be an output image with increased brightness. All the images would be converted into a matrix with values ranging from 0 – 255 so that the factor gets added to all the cells in the matrix. This is a basic scalar matrix addition, but the scale is relatively large.

Testing Environment

The assembly language varies according to the assembler, architecture, operating systems, etc. The cross-platform assembler that I preferred to use was NASM. It does not have a powerful syntax like the MASM (Microsoft based assembler), which is the system where the benchmark was conducted. For the complete code, you can check out my repo image-manipulation.

OSMac OS Mojave
Memory16 GB 1600 MHZ DDR3
Processor
2.5 GHz intel core i7

The C++ compiler used was G++ and NASM x86 64 for assembly. For the C++ part, I used an OpenCV library to conduct image manipulation. Code is given below,

#include "opencv2/imgcodecs.hpp"
#include "opencv2/highgui.hpp"
#include <iostream>
using std::cin;
using std::cout;
using std::endl;
using namespace cv;
int main( int argc, char** argv )
{
CommandLineParser parser( argc, argv, "{@input | lena.jpg | input image}" );
Mat image = imread( samples::findFile( parser.get<String>( "@input" ) ) );
clock_t time_req;
time_req = clock();
if( image.empty() )
{
cout << "Could not open or find the image!\n" << endl;
cout << "Usage: " << argv[0] << " <Input image>" << endl;
return -1;
}
Mat new_image = Mat::zeros( image.size(), image.type() );
double alpha = 1.0;
beta = 45;
for( int y = 0; y < image.rows; y++ ) {
for( int x = 0; x < image.cols; x++ ) {
for( int c = 0; c < image.channels(); c++ ) {
new_image.at<Vec3b>(y,x)[c] =
saturate_cast<uchar>( alpha*image.at<Vec3b>(y,x)[c] + beta );
}
}
}
uchar* image_data = image.data;
uchar* new_image_data = new_image.data;
cout << "Running Time" << clock() - time_req<<endl;
// cout << "The image is" << image_data <<endl;

imwrite( "orginal.jpeg", image );
imwrite( "newImage1.jpeg", new_image );

waitKey();
return 0;

This is the complete C++ program, which performs the above-mentioned task. The most time-consuming part of the code is

for( int y = 0; y < image.rows; y++ ) {
for( int x = 0; x < image.cols; x++ ) {
for( int c = 0; c < image.channels(); c++ ) {
new_image.at<Vec3b>(y,x)[c] =
saturate_cast<uchar>( alpha*image.at<Vec3b>(y,x)[c] + beta );
}
}
}

A loop of the order of n^3. In order to optimize the code, I moved this part to assembly, so the upload section was still in C++.

ection .text
global __start

; RDI 1st argument
; RSI 2nd argument
; RDX 3rd argument
; RCX 4th argument

; uchar* new_image, RDI
; uchar* old_image, RSI
; short brit,
; Size_<int> size

default rel
__start:
mov r10, 0
cmp rdx, 0
jl ReduceBright

mov r11w, 0ffffh

mainLoop:
movsx eax, word [rsi]
add eax, edx
cmovc ax, r11w
mov [rdi], eax

inc rsi
inc rdi
dec rcx
jnz mainLoop

ret

ReduceBright:
mov r11w, 0
neg r8w

MainLoopSubtract:
mov al, byte [rdx + r10]
sub al, r8b
cmovc ax, r11w
mov byte [rcx + r10], al
inc r10
dec r9d
jnz MainLoopSubtract
ret

The argument called in the function would come into the respective registers mentioned in the comment

; RDI 1st argument
; RSI 2nd argument
; RDX 3rd argument
; RCX 4th argument

The Result

The result was astounding. The assembly code was near 20x times faster than C++. The C++ code can be further optimized to make it faster but 20 times is a major improvement

C++
Running Time 4378
Assembly
Running Time 261

My objective for the exercise was to check how far we can optimize a code. So, the million-dollar question here is ‘Should we start programming in assembly language?’

The short answer is NO. It isn’t a good idea, even though it comes with the power of CPU registers. Great power comes with great responsibility, and the codes are hardly portable and maintainable. However, rewriting some parts of the programs with assembly is definitely a method that a developer should add to his arsenal to build highly scalable and high performing applications. Most of the compilers convert the code into assembly more efficiently than one would expect. With proper optimization, it is possible to reach considerable performance. Although, it does take a good amount of effort.

Author

  • mm
    Goutham Krishna

Goutham is a technologist and developer who specializes in Blockchain technologies and Cryptocurrencies. Though he’s worked within numerous privacy and security sectors, Goutham's recent emphasis has been on solutions built on Ethereum, Tezos, smart contracts, and smart signatures, in particular, decentralized self-sovereign identity.

Categories

View articles by categories

  • Uncategorized

Subscribe now to get our latest posts

All Rights Reserved. Accubits INC 2020