## Entry points - MongoDB is a document database [[Database#Document database]] - [What is a document database?](https://www.mongodb.com/resources/basics/databases/document-databases) in the MongoDB documentation - [MongoDB website](https://www.mongodb.com/) - [MongoDB University](https://learn.mongodb.com) - [MongoDB Atlas](https://www.mongodb.com/atlas): Cloud-based service - Entry points in the excellent MongoDB documentation - [Data modeling](https://www.mongodb.com/docs/manual/data-modeling/) - [Documents](https://www.mongodb.com/docs/manual/core/document/) - [Welcome to MongoDB Shell (`mongosh`)](https://www.mongodb.com/docs/mongodb-shell/) - [Query and Projection Operators](https://www.mongodb.com/docs/manual/reference/operator/query/) - Aggregation explained - [Aggregation](https://www.mongodb.com/resources/products/capabilities/aggregation) - [The MongoDB aggregation pipeline](https://www.mongodb.com/resources/products/capabilities/aggregation-pipeline) ## Structure of a MongoDB instance Each MongoDB instance can hold multiple databases. Each database can hold multiple collections. Each collection contains documents that hold similar information (for example personal information about customers). A document inside a collection holds the respective data in structured format. ``` - MongoDB instance - Database 1 - Collection 1 - Document 1 - Document 2 - ... - Document n - Collection 2 - ... - Collection n - Database 2 - Collection - ... - Database n ``` ## Document content - The data inside a document is structured as "key-value pairs" (sometimes also called "field-value" or "name-value") - MongoDB allows to do write and read operations in JSON format, but it actually stores the data in [BSON](https://bsonspec.org) format (MongoDB created its specification) - BSON has some advantages over JSON - It is more efficient in terms of storage - It supports several data formats (JSON does not) - BSON is not human-readable - Each document must contain an `_id` field. Its value is immutable. If no value is provided, MongoDB generates an `ObjectId` for it (which has a timestamp embedded) ## Data modeling Option 1: Putting several types of data into the documents of one collection ([[#Embedded documents]]). - Good: Quick and simple retrieval of data - Bad: Risk of duplicating data points Option 2: Splitting up data types into dedicated collections and reference them ([[#References]]): - Good: Less memory consumption (no duplication of data) - Bad: Slower data retrieval (having to look up data in each collection, then connect it) ### Embedded documents This means storing data of each data point plus its related data in one document ("[[Database#Denormalization|denormalized]] design"), which results in one document per data point in a single collection. Consider this when working with **one-to-one** and **one-to-many** relationships. The related inlined data is also called sub-document. > For many use cases in MongoDB, the denormalized data model where related data is stored within a single document is optimal. > > -- From [Database References](https://www.mongodb.com/docs/manual/reference/database-references/#document-references) in the MongoDB documentation ### References This means storing data points and related data in different collections and referencing the related data in the document where necessary ("[[Database#Normalization|normalized]] design"). Consider this when working with many-to-many relationships. The MongoDB documentation differentiates between "manual references" and DBRefs. ### Naming of databases, collections and fields MongoDB has a few [restrictions when it comes to naming of databases](https://www.mongodb.com/docs/manual/reference/limits/#naming-restrictions), collections and fields. Everything else seems subject to personal preference or convention; it just should be consistent. Good practices seem to be: - Database name: Only letters, [[Case styles|flatcase]] - Collection name: Only letters, plural, [[Case styles|flatcase]] - Field names: [[Case styles|camelCase]], except when the field refers to a document id of a different collection (when using [[#References]]), where the field name would be collection name + `_id` (for example `customers_id`) ## Indexing ### Index types #### Single field index This means that the [[Database#Index|index]] references one field of a document. If the indexed field is an array, MongoDB handles that automatically. This is then called a "single field multikey index". #### Compound index This means that the [[Database#Index|index]] references multiple fields of a document. MongoDB only allows one of the indexed fields as an array. A compound index not only improves performance for queries that filter for the indexed fields, but also for the first indexed field. In other words: There is no need to have a dedicated index only for the first indexed field. For example, imagine an index that includes an `age` field as the first field and `grade` as the second field. Then imagine two types of queries: - A `find()` query that only filters for `age` is also optimized by the compound index - A `find()` query that only filters for `grade` is **not** optimized by the compound index ### Index properties - [Partial index](https://www.mongodb.com/docs/manual/core/index-partial) - [Sparse index](https://www.mongodb.com/docs/manual/core/index-sparse) - [TTL index](https://www.mongodb.com/docs/manual/core/index-ttl) - [Unique index](https://www.mongodb.com/docs/manual/indexes/#unique-indexes) ## MongoDB Shell Connect to the MongoDB instance, then start the REPL with `mongosh`. The prompt shows the currently selected database. ``` show dbs # Show databases on the instance use the-db-name # Switch to database db # Show name of currently selected database show collections # Show collections in the current database db.collection-name.find() # Matches all documents in specified collection it # Iterates the cursor to the next batch of results ``` All following code examples assume a collection name of `customers`. This is to reflect the good practice of naming a collection with the plural of the content. ### Methods #### Mind the arguments The different methods support different parameters, which sometimes confused me in the beginning when it came to getting right all the braces and brackets. #### `find()` and `findOne()` [`find()`](https://www.mongodb.com/docs/manual/reference/method/db.collection.find/) multiple documents or `findOne()` document (the first that matches the query). Both methods take three optional arguments: Query, projection and options. ```mongodb db.customers.find( {}, {}, {} ) ``` Return all documents of the collection: ```mongodb db.customers.find() ``` Pass a query to search for documents that match the values of the specified fields: ```mongodb db.customers.find({ fieldName: "value", otherFieldName: "value" }) ``` Pass a query that looks for a value in an [[#Embedded documents|embedded document]]. Note the quotes and dot notation. ```mongodb db.customers.find({ "parentField.childField": "value" }) ``` Match only the documents that have the exact values (not more, not less) in an array field. For more options on how to query data in arrays, see the [[#`$all`]] and [[#`$elemMatch`]] operators. ```mongodb db.customers.find({ fieldName: [ "value1", "value2" ] }) ``` Call `find()` with the second parameter (projection) to only return specified fields. Pass `1` to include the specified field, pass `0` to exclude the specified field. Note: Combining inclusion and exclusion is only possible with the `_id` field. Mixing `1` and `0` values for other fields results in a `MongoServerError`. ```mongodb db.customers.find( { fieldName: "value" }, { desiredFieldName: 1 _id: 0 } ) ``` Chain `sort()` to sort the result in ascending order (use `-1` for descending order). Note: If the result contains multiple documents with the same value for the specified field, the order of documents can change with every query. ```mongodb db.customers.find().sort({ fieldName: 1 otherFieldName: 1 }) ``` Other useful chainable methods: - `.count()`: Returns number of matching documents - `.limit()`: Set maximum number of results - `.pretty()`: Prettify output - [[#`explain()`]]: Get performance metrics for a query #### `insertOne()` and `insertMany()` Insert one document into a collection (it creates the specified collection if it does not exist). ```mongodb db.customers.insertOne({ fieldName: "value", otherField: "value" }) ``` Insert multiple documents into a collection (and create the specified collection if non-existent). Note that the first argument is an array and not an object as with other commands. ```mongodb db.customers.insertMany( [document1, document2, ...] ) ``` #### `updateOne()` and `updateMany()` These update one or many documents. Mind the arguments. Both methods take three: Filter (see [[#`find()` and `findOne()`]]), update (document) and options, each of which is an object. The first two arguments are mandatory. ```mongodb db.customers.updateOne( {}, {}, {} ) ``` Update an existing document (see also [`$set` operator docs](https://www.mongodb.com/docs/manual/reference/operator/update/set/#mongodb-update-up.-set/)). The first argument specifies a filter for finding the desired document. Note that if multiple documents match with the filter criteria, the command updates the **first** matching document. ```mongodb db.customers.updateOne( { fieldName: "value" }, { $set: { fieldToChange: "new-value" } } ) ``` Update an element **inside an embedded document**. Note the quotes and the dot notation. ```mongodb db.customers.updateOne( { fieldName: "value" }, { $set: { "someField.childField": "new-value" } } ) ``` Update an existing element **inside an array**. Note the quotes and the dot notation to specify the element's index inside the array. ```mongodb db.customers.updateOne( { fieldName: "value" }, { $set: { "arrayToChange.0": "new-value" } } ) ``` Append a new element to an array. If the array field does not exist, the command will create it and add the value. `$push` can update/create multiple field arrays with multiple values at once. ```mongodb db.customers.updateOne( { fieldName: "value" }, { $push: { arrayToAddSomethingTo: "new-value" } } ) ``` **[[Database#Upsert|Upsert]]** data. Note the third argument that contains an `upsert` field. This query uses the `updateOne()` method, but `updateMany()` would work as well. ```mongodb db.customers.updateOne( { fieldName: "value" }, { $set: { fieldToChange: "new-value" } }, { upsert: true } ) ``` #### `findAndModify()` Find and return a document and optionally change it. **Note that this method takes named parameters within an object, so watch the curly braces.** Take this example call that only uses a few of the numerous available parameters. It... - ... searches for documents specified in `query` - ... updates the fields specified in `update` - ... returns the modified document, because of `new: true` - ... adds a new document if `query` did not yield a result ```mongodb db.customers.findAndModify({ query: { fieldToFind: "value" }, update: { fieldToUpdate: "value" }, new: true, upsert: true }) ``` What is the difference compared to [[#`updateOne()` and `updateMany()`|`updateMany()`]]? `findAndModify()` returns the to-be-modified document first, so one can apply further conditional modifications. `updateMany()` does not return the documents, which makes it more efficient. #### `deleteOne()` and `deleteMany()` Both methods take two arguments, the first being a filter (mandatory), the second being options (optional). ```mongodb db.customers.deleteOne( {}, {} ) ``` Delete the first document in the collection. ```mongodb db.customers.deleteOne({}) ``` Delete all documents in the collection. ```mongodb db.customers.deleteMany({}) db.customers.deleteMany() ``` #### `replaceOne()` Takes three arguments: A filter (mandatory), the document to insert (mandatory), options (optional). Note the difference to [[#`updateOne()` and `updateMany()`|`updateOne()`]]: `replaceOne()` replaces the found document completely, so if the original document contains more fields, the command will remove them. ```mongodb db.customers.replaceOne( {}, {}, {} ) ``` Replace the first returned document with the specified document. ```mongodb db.customers.replaceOne( { content: "value" }, { content: "New value", addedField: "Another value" } ) ``` #### `createIndex()` Create an index for the specified field. Setting `1` sorts the index by value in ascending order, `-1` sorts the index in descending order. Pass more fields to get a [[#Compound index]]. ```mongodb db.customers.createIndex({ fieldToIndex: 1 }) ``` #### `getIndexes()` List indexes of a collection. ```mongodb db.customers.getIndexes() ``` #### `dropIndex()` Remove specified index. ```mongodb db.customers.dropIndex("field_name_1") ``` #### `explain()` Get execution statistics for a query. This is especially interesting to get insight on performance before/after having [[#`createIndex()`|created an index]]. ```mongodb db.customers .find({ fieldName: "value" }) .explain("executionStats") ``` ### Operators #### `$gt` and `$lt` Match documents with a value greater than the one specified in the query. ```mongodb db.customers.find({ fieldName: { $gt: value } }) ``` Return documents that contain a value >= 2010 **and** <= 2015 inside an array. ```mongodb db.customers.find({ activeYears: { $gte: 2010, $lte: 2015 } }) ``` Operators like `$gt` also work on strings. They compare lexicographically. This example finds all students that contain a grade after the letter "C" in an array `grade`, where each array element is an object/document. ```mongodb db.students.find({ grades: { $elemMatch: { grade: { $gt: "C" } } } }) ``` #### `$all` Return documents(s) that contain(s) all of the specified values **inside a plain array field, in any order**. ```mongodb db.customers.find({ fieldName: { $all: ["value 1", "value 2"] } }) ``` #### `$elemMatch` Return documents that contain the specified values in a field that holds an **array of objects/documents**. In this example, `fieldName` holds such an array of objects. ```mongodb db.customers.find({ fieldName: { $elemMatch: { subField: "value", otherSub: "other-value" } } }) ``` #### `$exists` Only return documents where the respective field exists. ## Aggregation Analytics operations in MongoDB are called **aggregation**. Aggregation pipeline: A collection's data passes from the start to the end of the pipeline. The pipeline comprises of several stages. Each stage can perform a specific operation on the data, like filtering or sorting, adding new elements and creating a new collection as output. Use the `aggregate()` method to specify an aggregation pipeline. It takes an array of stages as first argument. The second argument is optional and thus not present in the following example. ```mongodb db.customers.aggregate([ { $match: { fieldName: "value" } } ]) ``` See also the [`$match` operator](https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/) in the MongoDB documentation.