Database / MongoDB Interview questions
MongoDB is a document based no-sql database which provides high performance, high availability and easy scalability.
MongoDB is a document oriented database and stores data in the form of BSON structure based documents. These documents are organized in a collection.
A NoSQL database provides a mechanism for storing and retrieval of data that is modeled in format other than the tabular relations used in relational databases such as MySQL, Oracle etc.
Types of NoSQL database are,
- Document based,
- Key Value,
- Graph,
- and Column Oriented.
- Document oriented,
- High performance,
- High availability,
- Easy scalability,
- and rich query language.
MongoDB, Cassandra, CouchDB, Hypertable, Redis, Riak, Neo4j, HBASE, Couchbase, MemcacheDB, RevenDB and Voldemort are few examples of NoSQL database.
The core components in the MongoDB package are:
- mongod, the core database process.
- mongos the controller and query router for sharded clusters.
- and mongo the interactive MongoDB Shell.
MongoDB stores BSON (Binary Interchange and Structure Object Notation) objects in the collection. The combination of the collection name and database name is called a namespace.
SQL Terms/Concepts. | MongoDB Terms/Concepts. |
database. | database. |
table. | collection. |
row. | document or BSON document. |
column. | field. |
index. | index. |
table joins. | $lookup, embedded documents. |
primary key. | primary key. |
Specify any unique column or column combination as primary key. | In MongoDB, the primary key is automatically set to the _id field. |
aggregation (for example, group by). | aggregation pipeline. |
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
MongoDB supports horizontal scaling through sharding.
No. MongoDB doesn't support primary key-foreign key relationship.
MongoDB has _id key field for every document that uniquely identifies the particular document.
The primary key-foreign key relationship can be achieved by embedding one document inside the another. As an example, a department document can have its employee document(s) embedded.
The process of synchronizing data across different servers is called replication. Replication gives redundancy and growing data availability with more copies of data on different database servers. It ensures high availablity by protecting the database from the loss of a single server.
32-bit MongoDB processes are limited to about 2 gb of data since MongoDB storage engine uses memory-mapped files for performance.
No. MongoDB does not support multi-document ACID transactions. However, MongoDB supports atomic operation on a single document.
It is a group of mongo instances that maintain same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.
db.createCollection(name,options) is used to create collection in mongodb.
Journaling is used to safe the backups in mongodb.
The primary replica set accepts all the write operations from clients.
Profiler is used to measure the performance characteristics of every operation against the database.
The secondary data sets replicate the primary's oplog and apply the operations to its data sets such that the secondaries' data sets reflect the primary data set.
Vertical scaling adds more CPU and storage resources to increase capacity.
Horizontal scaling divides the data set and distributes the data over multiple servers, or shards.
MongoDB pushes the data to disk lazily. It updates the immediately to the journal but writing the data from journal to disk happens lazily.
ObjectID is a 12-byte BSON type. It is composed of,
- 4 bytes value representing seconds,
- 3 byte machine identifier,
- 2 byte process id,
- 3 byte counter.
GridFS is used for storing and retrieving the large/binary files like audio, Images, Video files.
Mongodump command is used to create the backup of database.
Mongorestore command is used to restore the backup.
Use the db command.
You may also use db.getName() command.
watch function opens a change stream cursor on the collection.
Change streams allow applications to access real-time data changes without any complexity. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them.
const collection = db.collection('employee'); const changeStream = collection.watch(); changeStream.on('change', next => { // process next document });
2dsphere index supports queries that calculate geometries on an earth-like sphere. 2dsphere index supports all MongoDB geospatial queries: queries for inclusion, intersection, and proximity.
MongoDB | Cassandra | |
Data Availability | MongoDB has a single master directing multiple slave nodes. If the master node goes down, one of the slave nodes takes over its role. Although the strategy of automatic failover does ensure recovery, it may take up to a minute for the slave to become the master. During this time, the database isn't able to respond to requests. | Instead of having one master node, Cassandra utilizes multiple masters inside a cluster. With multiple masters present, there is no fear of any downtime. The redundant model ensures high availability at all times. |
Scalability | Only the master node can write and accept input. In the meantime, the slave nodes are only used for reads. Accordingly, as MongoDB has a single master node, it is limited in terms of writing scalability. | Having multiple master nodes increases Cassandras writing capabilities. It allows this database to coordinate numerous writes at the same time, all coming from its masters. Therefore, the more master nodes there are in a cluster, the better the write speed (scalability). |
Data Model | MongoDB's data model is categorized as object and document-oriented. This means it can represent any kind of object structures which can have properties or even be nested for multiple levels. | When it comes to Cassandra, there is a more traditional model. Cassandra has a table structure using rows and columns. Still, it is more flexible than relational databases since each row is not required to have the same columns. Upon creation, these columns are assigned one of the available Cassandra data types, ultimately relying more on data structure. |
Query Language | MongoDB uses queries structured into JSON fragments and does not have any query language support yet. | Unlike MongoDB, Cassandra has its own query language called CQL (Cassandra Query Language). Its syntax is similar to SQL but still has some limitations. |