Collections Java Interview Questions – Set 04

What do you know about the big-O notation and can you give some examples with respect to different data structures

The Big-O notation simply describes how well an algorithm scales or performs in the worst case scenario as the number of elements in a data structure increases.  The Big-O notation can also be used to describe other behavior such as memory consumption. At times you may need to choose a slower algorithm because it also consumes less memory. Big-o notation can give a good indication about performance for large amounts of data, but the only real way to know for sure is to have a performance benchmark with large data sets to take into account things that are not considered in Big-O notation like paging as virtual memory usage grows, etc. Although benchmarks are better, they aren’t feasible during the design process, so Big-O complexity analysis is the choice.

The algorithms used by various data structures for different operations like search, insert and delete fall into the following performance groups like constant-time O(1), linear O(n), logarithmic O (log n), exponential O (c to the power n), polynomial O(n to the power c), quadratic O (n to the power 2) and factorial O (n!) where n is the number of elements in the data structure. It is generally a tradeoff between performance and memory usage. Here are some examples.

Example 1: Finding an element in a HashMap is usually a constant-time, which is  O(1) . This is a constant time because a hashing function is used to find an element, and computing a hash value does not depend on the number of elements in the HashMap.

Example 2:  Linear search of an array, list, and LinkedList is linear, which is O(n). This is linear because you will have to search the entire list. This means that if a list is twice as big, searching it will take twice as long.

Example 3: An algorithm that needs to compare every element in an array to sort the array has polynomial complexity, which is O (n2). A nested for loop is O (n2).  An example is shown under sorting algorithms.

Example 4: Binary search of a sorted array or ArrayList is logarithmic, which is O(log n). Searching an element in a LinkedList normally requires O(n). This is one of the disadvantages of LinkedList over the other data structures like an ArrayList or array offering a O (log n) performance, which offers better performance than O(n)  as the number of elements increases. A logarithmic running times mean, if 10 items take at most x amount of time, 100 items will take say at most 2x amount of time, and 10,000 items will take at most  4x. If you plot this on a graph, the time decreases as  n (i.e. number of items) increases.

What can you tell about the performance of a HashMap compared to a TreeMap? Which one would you prefer?

A balanced tree does have O (log n) performance. The TreeMap class in Java maintains key/value objects in a sorted order by using a red-black tree. A red-black tree is a balanced binary tree. Keeping the binary tree balanced ensures the fast insertion, removal, and look-up time of O (log n). This is not as fast as a HashMap, which is O(1) ,  but the TreeMaphas the advantage of that the keys are in sorted order which opens up a lot of other capabilities.

Which one to choose?

The decision as to using an unordered collection like a HashSet  or HasMap versus   using a sorted data structure like aTreeSet  or TreeMap depends mainly on the usage pattern, and to some extent on the data size and the environment you run it on.   The practical reason for keeping the elements in sorted order is for frequent and faster retrieval of sorted data if the inserts and updates are frequent. If the need for a sorted result is infrequent like prior to producing a report or running a batch process, then maintaining an unordered collection and sorting them only when it is really required with Collections.sort(…) could sometimes be more efficient than maintaining the ordered elements. This is only an opinion, and no one can offer you a correct answer. Even the complexity theories like Big-O notation like O(n) assume possibly large values of n.  In practice, a O(n) algorithm can be much faster than a O(log n) algorithm, provided the data set that is handled is sufficiently small. One algorithm might perform better on an AMD processor than on an Intel. If your system is set up to swap, disk performance need to be considered. The only way to confirm the efficient usage is to test and measure both performance and memory usage with the right data size. Measure both the approaches on your chosen hardware to determine, which is more appropriate.

What is Java Collections API

Java Collections framework API is a unified architecture for representing and manipulating collections. The API contains Interfaces, Implementations & Algorithm to help java programmer in everyday programming. In nutshell, this API does 6 things at high level

  • Reduces programming efforts. – Increases program speed and quality.
  • Allows interoperability among unrelated APIs.
  • Reduces effort to learn and to use new APIs.
  • Reduces effort to design new APIs.
  • Encourages & Fosters software reuse.

To be specific, There are six collection java interfaces. The most basic interface is Collection. Three interfaces extend Collection: Set, List, and SortedSet. The other two collection interfaces, Map and SortedMap, do not extend Collection, as they represent mappings rather than true collections.

What is ArrayList In java

ArrayList is a part of the Collection Framework. We can store any type of objects, and we can deal with only objects. It is growable

Performance of Map interface implementations

Hashtable

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. Note that the hash table is open: in the case of a “hash collision”, a single bucket stores multiple entries, which must be searched sequentially. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. The initial capacity and load factor parameters are merely hints to the implementation. The exact details as to when and whether the rehash method is invoked are implementation-dependent.

HashMap

This implementation provides constant-time [ Big O Notation is O(1) ] performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the “capacity” of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

TreeMap

The TreeMap implementation provides guaranteed log(n) [ Big O Notation is O(log N) ] time cost for the containsKey, get, put and remove operations.

LinkedHashMap

A linked hash map has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashMap. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashMap, as iteration times for this class are unaffected by capacity.

What is java.util.concurrent BlockingQueue? How it can be used

Java has implementation of BlockingQueue available since Java 1.5. Blocking Queue interface extends collection interface, which provides you power of collections inside a queue. Blocking Queue is a type of Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element. A typical usage example would be based on a producer-consumer scenario. Note that a BlockingQueue can safely be used with multiple producers and multiple consumers. An ArrayBlockingQueue is a implementation of blocking queue with an array used to store the queued objects. The head of the queue is that element that has been on the queue the longest time. The tail of the queue is that element that has been on the queue the shortest time. New elements are inserted at the tail of the queue, and the queue retrieval operations obtain elements at the head of the queue. ArrayBlockingQueue requires you to specify the capacity of queue at the object construction time itself. Once created, the capacity cannot be increased. This is a classic “bounded buffer” (fixed size buffer), in which a fixed-sized array holds elements inserted by producers and extracted by consumers. Attempts to put an element to a full queue will result in the put operation blocking; attempts to retrieve an element from an empty queue will be blocked.

Why Java Vector class is considered obsolete or unofficially deprecated? or Why should I always use ArrayList over Vector

You should use ArrayList over Vector because you should default to non-synchronized access. Vector synchronizes each individual method. That’s almost never what you want to do. Generally you want to synchronize a whole sequence of operations. Synchronizing individual operations is both less safe (if you iterate over a Vector, for instance, you still need to take out a lock to avoid anyone else changing the collection at the same time) but also slower (why take out a lock repeatedly when once will be enough)? Of course, it also has the overhead of locking even when you don’t need to. It’s a very flawed approach to have synchronized access as default. You can always decorate a collection using Collections.synchronizedList – the fact that Vector combines both the “resized array” collection implementation with the “synchronize every operation” bit is another example of poor design; the decoration approach gives cleaner separation of concerns. Vector also has a few legacy methods around enumeration and element retrieval which are different than the List interface, and developers (especially those who learned Java before 1.2) can tend to use them if they are in the code. Although Enumerations are faster, they don’t check if the collection was modified during iteration, which can cause issues, and given that Vector might be chosen for its syncronization – with the attendant access from multiple threads, this makes it a particularly pernicious problem. Usage of these methods also couples a lot of code to Vector, such that it won’t be easy to replace it with a different List implementation. Despite all above reasons Sun may never officially deprecate Vector class.

What is fail-fast property

At high level – Fail-fast is a property of a system or software with respect to its response to failures. A fail-fast system is designed to immediately report any failure or condition that is likely to lead to failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly-flawed process. When a problem occurs, a fail-fast system fails immediately and visibly. Failing fast is a non-intuitive technique: “failing immediately and visibly” sounds like it would make your software more fragile, but it actually makes it more robust. Bugs are easier to find and fix, so fewer go into production. In Java, Fail-fast term can be related to context of iterators. If an iterator has been created on a collection object and some other thread tries to modify the collection object “structurally”, a concurrent modification exception will be thrown. It is possible for other threads though to invoke “set” method since it doesn’t modify the collection “structurally”. However, if prior to calling “set”, the collection has been modified structurally, “IllegalArgumentException” will be thrown.

Why doesn’t Collection extend Cloneable and Serializable

From Sun FAQ Page: Many Collection implementations (including all of the ones provided by the JDK) will have a public clone method, but it would be mistake to require it of all Collections. For example, what does it mean to clone a Collection that’s backed by a terabyte SQL database? Should the method call cause the company to requisition a new disk farm? Similar arguments hold for serializable. If the client doesn’t know the actual type of a Collection, it’s much more flexible and less error prone to have the client decide what type of Collection is desired, create an empty Collection of this type, and use the addAll method to copy the elements of the original collection into the new one. Note on Some Important Terms

  • Synchronized means only one thread can modify a hash table at one point of time. Basically, it means that any thread before performing an update on a hashtable will have to acquire a lock on the object while others will wait for lock to be released.

Fail-fast is relevant from the context of iterators. If an iterator has been created on a collection object and some other thread tries to modify the collection object “structurally”, a concurrent modification exception will be thrown. It is possible for other threads though to invoke “set” method since it doesn’t modify the collection “structurally”. However, if prior to calling “set”, the collection has been modified structurally, “IllegalArgumentException” will be thrown

How to Serialize a collection in java? How to serialize a ArrayList, Hashmap or Hashset object in Java

All standard implementations of collections List, Set and Map interface already implement java.io.Serializable. All the commonly used collection classes like java.util.ArrayList, java.util.Vector, java.util.Hashmap, java.util.Hashtable, java.util.HashSet, java.util.TreeSet do implement Serializable. This means you do not really need to write anything specific to serialize collection objects. However you should keep following things in mind before you serialize a collection object – Make sure all the objects added in collection are Serializable. – Serializing the collection can be costly therefore make sure you serialize only required data isntead of serializing the whole collection. – In case you are using a custom implementation of Collection interface then you may need to implement serialization for it.

What is an enumeration

An enumeration is an interface containing methods for accessing the underlying data structure from which the enumeration is obtained. It is a construct which collection classes return when you request a collection of all the objects stored in the collection. It allows sequential access to all the elements stored in the collection.