Data Structures — Simplified and Classified


Photo by Christina Morillo via Pexels (CC0)

In order to pass programming interviews you need to solve algorithms and to solve algorithms you need data structures. Now the problem is when you open a data structures and algorithms textbook, it’s filled with complex math and for those of you who are not a big fan of the subject, it will turn into a nightmare. However, once you understand data structures, they’re not as difficult as they seem. This article will simplify and summarize these most essential data structures that you will understand and will be able to use easily.

What is a data structure?

A data structure is a particular way of organizing data in a computer so that it can be used effectively:

“In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.” — Wikipedia

So, we have to store the data in some kind of a data structure and choosing the right structure is crucial.

It’s important to note that no data structure is good or bad, rather each one of them has its own pros and cons. Many ways are used to measure how good a data structure is. These are also known as the Big O notations. This is a measure of how well an operation is scalable.

Now, let’s discuss the different types of data structures.

Linked List

The atomic unit of the Linked List is called the “node”. This node contains a value and a pointer. The value is simply a number, say 12, while the pointer connects the value to the next node in the chain. Hence the linked part of the Linked List. The first node in the list is known as the head, and the last node with no pointer is known as the tail.

Linked list
Image credit: Wikipedia

Advantage: Great for adding new nodes and deleting nodes. This is done by changing where the next pointer points.

Disadvantage: Not too good at retrieving nodes despite knowing the index or searching since each node is only aware of the node next to it.

Array

Arrays are quite common in all programming languages. It is a continuous block of cells in a computer memory.

The following example represents an integer array that has 12 elements. The index of the Array starts with 0, so the array having 12 elements has indexes from 0 to 11.

Data Structure Array
Image credit: BeginnersBook

Advantage: Arrays are great at retrieving items but only when arrays are small.

Disadvantage: As the arrays keep growing in size and we start running into other items within the memory. As a result, adding is inefficient because we might have to move our array to a new place within the memory so that it fits.

Luckily, this happens in high level languages like JavaScript and Python. In lower level languages, however, you have to declare the size of your array in advance.

Hash Map

The third type of data structure, which is also a crucial one, is the hash map or a hash table. This is similar to an object in JavaScript and a dictionary in Python language. In this type of data structure, you give the hash map a word or a key and it retrieves the definition or the value for you. Under the hood it works a lot like an array. The key gets run through the function called the “hashing function” that spills out the memory location for you.

The way it’s different is that these memory locations need not be next to each other, rather can be anywhere. Therefore, there’s no issue with the increase in size. However, there is a different problem — depending on the hashing algorithm you use, two keys could hash to the same memory location. This is what’s known as a “collision” and there are multiple ways to resolve them but again, it’s all happening under the hood.

Data Structure Hash Map
Image credit: Wikipedia

Advantage: As we have learned, hash maps are great for adding and retrieving.

Disadvantage: May cause collision of keys.

Stack and Queue

These two structures are very similar to each other and both are built on top of arrays with a few additional features.

The stack is a last in first out data structure. This can be analogous to a pile of tray stacked on top of each other where the last tray you place on top, is the first one you need to take off. When we add an item to the top, it’s called pushing while when we take out an item from the top it’s called popping. Every language keeps track of the functions that have been called with something called as the call stack. Stacks are quite important for an algorithm called Depth-first Search (DFS).

Data Structure Stack
Image credit: Wikipedia

The queue on the other hand is a first in first out structure, similar to a queue or line at the bus stop, where the last person to join the line enters the bus last. Adding an item to the end is called enqueuing and removing it from the front is called dequeuing. Queues are used in an important algorithm called Breadth-first Search (BFS).

Data Structure Queue
Image credit: Wikipedia

Advantage: Both structures are very efficient in adding and removing items.

Disadvantage: They both have quite a limited number of use cases compared to other data structures.

Graphs and Trees

This topic is quite elaborative and there is a dedicated section in computer languages called the “graph theory”. A graph is similar to a Linked List where we have nodes pointing towards other nodes, except in this case the pointers are called “edges”. Edges can also have weights or numbers assigned to them.

There’s a special type of hierarchical graph called the “tree” in which the data expands out in one direction. These can be used to refer to a lot of things like the “family tree” or used to represent networks. Unlike Array, Linked List, Stack and Queue, which are linear data structures, trees are hierarchical data structures.

Data Structure Graphs and Trees
Image credit: Wikipedia

Advantage: Trees provide an efficient insertion and searching and trees are very flexible data, allowing to move subtrees around with minimum effort.

Disadvantage: The disadvantage is that it takes O(log n) time to modify the list and to retrieve.

In conclusion, these data structures will give you a great foundation to start solving algorithms.

I hope that now you have more knowledge about data structures and you can work with your programming languages with a better comprehension of the feasible structures that are under the surface of abstraction.