Websites have ideal structures and seamless designs, It’s obvious that your database should have the potential to handle high-level data. And also, It should maintain standard protocols to meet the challenges of handling data. On the other hand, Databases will possess different functionalities based on the size of data sets. So, How we can find the appropriate method to handle data sets based on its size. On this list, what MongoDb has in its bucket to offer us. Hopefully, This article will help you to analyze the challenges of handling large datasets and small datasets in MongoDB. MongoDB provides 2 way of performing aggregation:
- Aggregation pipeline
- Map-reduce function
What is the Aggregation pipeline? And How it will work?
Aggregation pipeline is an enhanced framework for transforming excessive documents into aggregated results by utilizing a multi-stage pipeline.
Aggregation operation is precisely used to fetch computed results from a group of values.
Will allow us to play out a variety of operations to return specific results from the clustered data.
This framework profounds an alternative solution for aggregation entitled “ Map reduce”.
ALSO READ: How To Develop A Healthcare Mobile Application For Hospitals
literally, It’s treated as a preferable method to overcoming on-board complexities.
The aggregate() Method:
Aggregation pipeline is addressed as aggregate() method in MongoDB.
And the basic syntax of aggregate() method is followed by,
db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Henceforth, let us explain with the piece of our data sample to figure out how these extensive operations and functions are held up.
So you can examine more about it.
Will take following datasets,
{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b25"), "HospitalAlias" : "cb", "DateOfService" : "1/28/2017", "DoctorAlias" : "aful", "Charge" : "a", "Level" : 2 } { "_id" : ObjectId("58be8ccb99c2b03ff17c4b26"), "HospitalAlias" : "cb", "DateOfService" : "1/29/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 } { "_id" : ObjectId("58be8ccb99c2b03ff17c4b27"), "HospitalAlias" : "cb", "DateOfService" : "1/30/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 }
With the above collection, if you want to calculate how many persons are admitted into the hospital by date of service, then you can use the following aggregate() method:
db.hospital_rounds.aggregate([{ $group: { _id: "$DateOfService", y: { $sum: 1 } } }])
Now the above query returns the following results:[\code]
{ "_id" : "3/8/2017", "y" : 3 } { "_id" : "3/7/2017", "y" : 4 } { "_id" : "10/6/2016", "y" : 83 } { "_id" : "10/7/2016", "y" : 93 } { "_id" : "10/2/2016", "y" : 69 } { "_id" : "10/4/2016", "y" : 88 }
Probably you might notice how we have aggregated the specific results from the above data cluster.
ALSO READ: How To Build A Dating App Like Tinder
The following image will give you a clear idea,
Similarly, you have other operators to perform various operations.
We have listed down a few major operators on your view.
Pipeline operators:
- $match – Filter documents
- $project – Reshape documents
- $group – Summarize documents
- $wind – Expand documents
- $sort – Order documents
- $limit / $skip – Paginate documents
- $redact – restrict documents
- $geoNear – Proximity sort documents
- $let / $map – Bind variables to sub-expressions
- $out – send result to collection
What Is Map Reduce?
Map-reduce function wisely used to access large datasets into a handful of aggregated results.
And mapreduce() command used to execute this function.
The Custom JavaScript function will add flexibility to this function.
Leveraging this method, a JavaScript function can evaluate and modify the final results to perform additional calculations.
The Map-Reduce Method:
Syntax:
db.COLLECTION_NAME.mapReduce(map(),reduce(),{query:{}})
ALSO READ: MongoDB In Golang With Examples – A Beginner’s Guide
Consider the following data for performing the map-reduce method,
{ "_id" : ObjectId("58be8ccb99c2b03ff17c4b25"), "HospitalAlias" : "cb", "DateOfService" : "1/28/2017", "DoctorAlias" : "aful", "Charge" : "a", "Level" : 2 } { "_id" : ObjectId("58be8ccb99c2b03ff17c4b26"), "HospitalAlias" : "cb", "DateOfService" : "1/28/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 } { "_id" : ObjectId("58be8ccb99c2b03ff17c4b27"), "HospitalAlias" : "cb", "DateOfService" : "1/30/2017", "DoctorAlias" : "jliu", "Charge" : "f", "Level" : 1 }
If you apply the same concept here, you could fetch the number of persons has visited the hospital on a specified date by executing the following query.[/code]
var o = { } o.map = function () { emit(this.dateOfService, 1); }; o.reduce = function (k, vals) { return Array.sum(vals); }; o.query = { hospitalAlias: “cb”}; db.COLLECTION_NAME.mapReduce(o)
This query will return back the specific values.
Output:
{ "_id" : "1/28/2017", "value" : 2 } { "_id" : "1/30/2017", "value" : 1 }
You can also take a look at the below sample,
Limitations
The limitation of storing data is allotted up to 16 MB in both cases.
Obviously, we will be in trouble when the limit exceeds.
It’s always preferable to have a good practice of keeping old data aside”.
In that case, once you reach the limit, you can export the data and keep it separate with the help of “Out operator”.
So you can set free space to store your new data. Once the whole process got over, you can always merge all the datasets later.
We also have one more option to handle this, with the help of allowDiskUse command you can free up the disk space.
Conclusion
Queries that are very complex are difficult to handle in the aggregation framework. It can help you out but it’s not advisable to use for complex queries.
While proceeding, the small datasets will take considerable time to load in Map reduce and even large datasets will occupy the same time to process.
Be it a small dataset or big, both will take the same amount of time to execute.
So it’s better to handle large datasets in map-reduce function.
For the considerable flexibility over datasets, the Map-reduce function can handle large datasets faster.
Parallelly, you can go with the Aggregation pipeline for handling small datasets.
It would be great if you choose any of the methods based on the size of your datasets.