Database / Apache Cassandra Interview Questions

1. What is Apache Cassandra? 2. Key features of Cassandra. 3. Compare Cassandra Vs Relational Databases. 4. How does Cassandra store data? 5. What are SSTables in Cassandra? 6. What is CommitLog in Cassandra? 7. What are Memtables in Cassandra? 8. What is the NoSQL database? 9. Advantages of NoSQL Databases. 10. What is CQL? 11. What are the main components of Cassandra? 12. What is a Node in Cassandra? 13. What is the Data Center and Cluster in Cassandra? 14. What is meant by Cassandra rack? 15. Difference between Memtable and SSTable. 16. Explain the concept of Bloom Filter in Cassandra. 17. What is Cqlsh in Cassandra? 18. What is source command in Cassandra? 19. What is the purpose of using thrift in Cassandra? 20. What is replication factor in Cassandra? 21. Explain Cassandra Data Model. 22. What is Super Column in Cassandra? 23. does cassandra have joins? 24. What are keyspaces in Cassandra? 25. What are cassandra collections? 26. What does "frozen" mean in Cassandra? 27. Which operations are not allowed on frozen collections in Cassandra?

Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is Apache Cassandra?

Cassandra is an free, open-source, distributed, and NOSQL database management system used to handle large amount of data. Cassandra provides high availability without any failure.

Cassandra is written in Java. It is originally designed by Facebook consisting of flexible schemas. It is highly scalable for big data.

Cassandra has its own Cassandra Query Language (CQL). CQL is a simple interface for accessing Cassandra, as an alternative to the traditional Structured Query Language (SQL).

2. Key features of Cassandra.

Open-source availability.
Distributed footprint.
Scalability.
Cassandra Query Language.
Fault tolerance.
Schema free.
Tunable consistency.
Fast writes.
Peer-to-peer architecture.

3. Compare Cassandra Vs Relational Databases.

Cassandra	RDBMS
Data may be unstructured.	Only structured data.
Flexible schema.	Fixed schema.
Data is written in many locations.	Data is written in mostly one location.
In Cassandra, a table is a list of "nested key-value pairs". (Row x Column Key x Column value)	In RDBMS, a table is an array of arrays. (Row x Column)
Keyspace is the outermost container which contains data corresponding to an application.	Database is the outermost container which contains data corresponding to an application.

4. How does Cassandra store data?

The data storage path in Cassandra begins with the memtable where the data is stored temporarily and also to a commit log. And once committed, the data is periodically flushed and written into SSTable.

Logging data in the commit log,
Writing data to the memtable,
Flushing data from the memtable,
Storing data on disk in SSTables.

5. What are SSTables in Cassandra?

SSTables are the immutable data files that Cassandra uses for persisting data on disk. As SSTables are flushed to disk from memtables or are streamed from other nodes, Cassandra triggers compactions which combine multiple SSTables into one. Once the new SSTable has been written, the old SSTables can be removed.

6. What is CommitLog in Cassandra?

Commitlogs are an append only log of all mutations local to a Cassandra node. Any data written to Cassandra will first be written to a commit log before being written to a memtable. This provides durability in the case of unexpected shutdown. On startup, any mutations in the commit log will be applied to memtables.

7. What are Memtables in Cassandra?

Memtables are in-memory structures where Cassandra buffers writes. In general, there is one active memtable per table. Eventually, memtables are flushed onto disk and become immutable SSTables.

8. What is the NoSQL database?

NoSQL, also referred to as "not only SQL", "non-SQL", is an approach to database design that enables the storage and querying of data outside the traditional structures found in relational databases. While it can still store data found within relational database management systems (RDBMS), it just stores it differently compared to an RDBMS. The decision to use a relational database versus a non-relational database is largely contextual, and it varies depending on the use case.

Instead of the typical tabular structure of a relational database, NoSQL databases, house data within one data structure, such as JSON document.

9. Advantages of NoSQL Databases.

Handle large volumes of data at high speed with a scale-out architecture Store unstructured, semi-structured, or structured data.
Enable easy updates to schemas and fields.
Be developer-friendly.
Take full advantage of the cloud to deliver zero downtime.

10. What is CQL?

CQL query language is a NoSQL interface that is intentionally similar to SQL, providing users who are comfortable with relational databases a familiar language that ultimately lowers the barrier of entry to Apache Cassandra.

11. What are the main components of Cassandra?

The components of Cassandra are:

Node
Data cluster
Commit log
Cluster
Mem-table
SSTable
Bloom filter

12. What is a Node in Cassandra?

A node represents a single instance of Cassandra. These nodes communicate with one another through a protocol called gossip, which is a process of computer peer-to-peer communication. Since it is a distributed database, Cassandra can (and usually does) have multiple nodes.

Node is where the data is stored.

13. What is the Data Center and Cluster in Cassandra?

Cassandra Datacenter is a group of nodes related and configured within a cluster for replication purposes. A datacenter is a logical set of racks. The datacenter should contain at least one rack.

A cluster is a component that contains one or more datacenters.

14. What is meant by Cassandra rack?

A rack is a collection of servers. A Cassandra rack is a logical grouping of nodes within the ring.

15. Difference between Memtable and SSTable.

MemTable doesn't store the data. It temporarily accumulates 'write data', while SStable, store the data from Memtable into the Cassandra database. The data stored in SSTable is permanent and cannot be changed.

16. Explain the concept of Bloom Filter in Cassandra.

Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

17. What is Cqlsh in Cassandra?

Cqlsh (Cassandra Query Language Shell) configures the CQL interactive terminal. It is a Python-based command-line prompt used on Linux or Windows and executes CQL commands like ASSUME, CAPTURE, CONSISTENCY, COPY, DESCRIBE, and many others. With cqlsh, users can define a schema, insert data, and execute a query.

18. What is source command in Cassandra?

Source command is used to execute a file consisting of CQL statements.

SOURCE '~/data/insert_data.cql'

19. What is the purpose of using thrift in Cassandra?

Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

20. What is replication factor in Cassandra?

Replication factor (RF) is the number that determines how many nodes get the copy of the same data in the cluster. For example, three nodes in the ring will have copies of the same data with RF=3.

21. Explain Cassandra Data Model.

Cassandra data model consists of four main components:

Cluster: Made up of multiple nodes and keyspaces.
Keyspace: A namespace to group multiple column families, especially one per partition.
Column: Consisting of a column name, value, and timestamp.
Column Family: Multiple columns with the row key reference.

22. What is Super Column in Cassandra?

A super column is a special column, therefore, it is also a key-value pair. But a super column stores a map of sub-columns.

Generally column families are stored on disk in individual files. Therefore, to optimize performance, it is important to keep columns that you are likely to query together in the same column family, and a super column can be helpful here.Given below is the structure of a super column.

23. does cassandra have joins?

No, Apache Cassandra does not natively support joins in the way relational databases do. Instead, you'll need to either perform the join logic in your application or denormalize your data by creating multiple tables that represent the join results.

Alternatives to the native join is by using denormalization and Client-Side Joins.

Denormalization is a common approach in Cassandra is to denormalize your data, meaning you replicate data across multiple tables to avoid the need for joins. This allows for faster reads by pre-joining the data.

Using client side joins, you can perform the join logic in your application by querying the required data from different tables and then combining it in your code. This approach can be less efficient for complex queries and large datasets.

24. What are keyspaces in Cassandra?

The Cassandra keyspace is a namespace that defines how data is replicated on nodes. Typically, a cluster has one keyspace per application. Replication is controlled on a per-keyspace basis, so data that has different replication requirements typically resides in different keyspaces.

Keyspace name max length is 48 characters.

25. What are cassandra collections?

In Apache Cassandra, collections are a way to store and group data within a table column, offering three main types: sets (unordered, unique values), lists (ordered, potentially non-unique values), and maps (key-value pairs).

Sets store a collection of unique values of the same data type in an unordered manner.

Lists store a collection of values of the same data type in a specific order, allowing for duplicate values.

Maps store a collection of key-value pairs, where each key is unique and both keys and values have associated data types.

26. What does "frozen" mean in Cassandra?

A frozen value serializes multiple components into a single value. Non-frozen types allow updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire value must be overwritten.

27. Which operations are not allowed on frozen collections in Cassandra?

A column whose type is a frozen collection (set, map, or list) can only have its value replaced as a whole. In other words, we can't add, update, or delete individual elements from the collection as we can in non-frozen collection types.

Amazon DynamoDB Interview questions

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures Database MuleESB Cloud Scala Tools	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2020, javapedia.net, all rights reserved. privacy policy.

Database / Apache Cassandra Interview Questions

Comments & Discussions

Recently added...