tencent cloud

TencentDB for MongoDB

Index Design and Optimization

ダウンロード
フォーカスモード
フォントサイズ
最終更新日: 2026-06-24 10:34:01

Use Cases

In MongoDB, well-designed indexes are essential for ensuring query performance and system stability. In real-world production environments, improper use of indexes often leads to serious availability issues, such as:
Sharp drop in query performance: Core queries that miss indexes trigger collection scans, causing response times to rise from milliseconds to seconds and resulting in API timeouts.
Out-of-memory errors: In-memory sorting of tens of millions of records without an index exceeds the MongoDB default memory limit (32 MB before version 4.4, and 100 MB for version 4.4 and later), causing queries to abort directly.
Blocked read/write requests: Creating an index on a large collection in foreground mode during business peak hours locks the collection and forces business reads and writes to be interrupted.
Surge in write latency: A large number of redundant indexes on a single collection (for example, more than 20) require every write operation to synchronously maintain multiple index trees, severely consuming compute resources.
This document aims to provide standardized index design and operations specifications, ensuring that core queries accurately hit indexes and that index change operations in the production environment are safe and controllable.

Basic Principles for Index Design

Principle 1: ESR Principle (Composite Index Field Ordering)

Core Action: When building a compound index, always arrange fields in the order of Equality, Sort, and Range. This principle is based on the traversal mechanism of B-Tree indexes and aims to maximize index utilization and eliminate in-memory sorting.
Order
Type
Description
Query Example
E
Equality (Exact Match)
Leftmost position of the index; exact matches quickly narrow the scan range
{ status: "paid" }
S
Sort (Sorting)
Immediately follows E; leverages the ordered nature of the index to avoid in-memory sorting
.sort({ createTime: -1 })
R
Range (Range Query)
Rightmost position of the index; after a range scan the index order is broken, and subsequent fields degrade to row-by-row filtering
{ createTime: { $gte: ISODate("2024-01-01") } }
Business Application: Take an order query API as an example. The requirement is to "find a specific user's paid orders, sorted by amount in descending order, and only for the most recent month."
Problem: The index was designed as {userId: 1, status: 1, createTime: 1, amount: -1}, placing Range (createTime) before Sort (amount). The index diverges on the time field, so the subsequent sort by amount cannot use the index, triggering in-memory sorting. Query latency reached the second level with a risk of OOM.
Optimization: Adjust the index to {userId: 1, status: 1, amount: -1, createTime: 1}, strictly following the ESR order. After equality fields filter precisely, the natural order of the index is used directly to output sorted results, and finally the time field is used for boundary truncation. Query latency dropped from the second level to 10 ms, a performance improvement of over 100x.
ESR Index Design Example: The following example compares the correct ESR order with an incorrect order, demonstrating the impact of the Range field position on sort performance.
// Business query statement
db.t_orders.find({
userId: 10086, // E: Equality
status: "paid", // E: Equality
createTime: { $gte: ISODate("2024-02-01") } // R: Range
}).sort({ amount: -1 }) // S: Sorting

// Error: With Range before Sort, the index diverges on createTime and cannot support amount sorting, triggering in-memory sorting.
db.t_orders.createIndex(
{ userId: 1, status: 1, createTime: 1, amount: -1 },
{ background: true }
)

// Correct: ESR order, equality filtering → index sorting → range truncation
db.t_orders.createIndex(
{ userId: 1, status: 1, amount: -1, createTime: 1 },
{ background: true }
)

Principle 2: Leftmost Prefix Matching

Core Action: A query condition must explicitly include the leftmost field of the compound index; otherwise the index is ineffective. If your business scenario includes an independent query requirement that skips the leftmost field, you must create a separate index for that field.
The B-Tree structure of a compound index has a strict field dependency order: it first sorts globally by the first field; within ranges where the first field is identical, it then sorts locally by the second field. If a query lacks the "leftmost field" as the global search baseline, the database cannot use the index tree for path navigation, and the query is forced to degrade into a collection scan (COLLSCAN).
Assume the collection has a compound index { a: 1, b: 1, c: 1 }. The index utilization paths for different query conditions are shown in the following table:
Query Condition
Index Utilization
Execution Description
{ a: 1 }
Full utilization
Hits the leftmost prefix a.
{ a: 1, b: 1 }
Full utilization
Hits the leftmost prefix a, b.
{ a: 1, b: 1, c: 1 }
Full utilization
Fully matches all index fields a, b, c.
{ a: 1, c: 1 }
Partial utilization
Only the leftmost field a can be used for index branch positioning. The absence of the middle field b blocks index matching for field c.
{ b: 1 }
Unavailable
The absence of the leftmost field a causes the loss of the search baseline and triggers a full table scan.
{ b: 1, c: 1 }
Unavailable
The absence of the leftmost field a triggers a full table scan.
Business Application: Take a business system as an example. The original collection had a compound index {userId: 1, status: 1, createTime: 1}.
Problem: The operations backend issued an independent query db.orders.find({status: "pending"}). Because the leftmost prefix userId was missing, the compound index could not be used at all, triggering a collection scan. The API timeout rate during peak hours surged to 30%.
Optimization: After identifying the query requirement that skipped the leftmost field, a separate single-field index {status: 1} was created for the status field, restoring system availability.
Leftmost Prefix Matching Index Example: The following example demonstrates the scenario where a missing leftmost field causes the index to fail, along with the corresponding solution.
// A compound index already exists.
db.t_orders.createIndex(
{ userId: 1, status: 1, createTime: 1 },
{ background: true }
)

// Correct: Contains the leftmost field userId, and the index is fully hit.
db.t_orders.find({ userId: 10086, status: "paid" })

// Partial hit: Contains the leftmost field userId but skips status, using only userId for positioning.
db.t_orders.find({ userId: 10086, createTime: { $gte: ISODate("2024-01-01") } })

// Error: Missing the leftmost field userId, triggering a full table scan.
db.t_orders.find({ status: "pending" })

// Solution: Create a separate index for the independent query field.
db.t_orders.createIndex(
{ status: 1 },
{ background: true }
)

Principle 3: High-Cardinality Field First Rule

Core Action: When building a compound index with multiple equality match conditions, always place fields with high distinctiveness (high cardinality) first, and fields with low distinctiveness (low cardinality) later.
Placing high-cardinality (highly distinctive) fields first leverages their strong "selectivity" to rapidly narrow the search range at the early stage of the query, ensuring the database excludes the vast majority of irrelevant records during the B+ Tree index search phase, thereby reducing disk I/O throughput and the volume of data pages scanned.
According to the leftmost prefix matching rule, an index with high-cardinality fields first can support both "combined queries" and "single-field prefix queries", achieving efficient index reuse. This strategy reduces the creation of redundant indexes, ensuring query speed while effectively lowering storage consumption and the index maintenance overhead of frequent write operations (inserts, updates, and deletes).
Distinctiveness Level
Data Characteristic
Field Example
Index Placement Policy
High
Unique or rarely duplicated
User ID, Order Number
Should be placed as the leading field in composite indexes.
Medium
Contains some duplicates but has many categories.
City, Product Category
Effective. Usually placed in the middle of composite indexes and must be used in combination.
Low
Extremely small value domain with massive duplicates
Gender, Boolean Value
Poor. If it must be included in a composite index, it should be strictly placed at the very end.
Business Application: The core value of placing high-cardinality fields first is that when a high-cardinality field serves as the index prefix, it can be reused by more query patterns, covering more scenarios with fewer indexes.
Take an order system as an example. The original collection had a compound index {userId, status}. As business requirements changed, the following two query needs emerged:
Scenario A: db.orders.find({ userId: "U123" })
Result: Directly hits the leftmost prefix of the compound index. Because userId has high cardinality, retrieval performance is excellent.
Scenario B: db.orders.find({ status: "completed" })
Analysis: This query cannot use the existing index. However, because the status field itself has very low cardinality, even matching the leftmost status field through a {status, userId} index would yield poor performance.

Index Creation Specifications

Specification 1: Background Indexing Principle

Core Action: When creating an index in a production environment, you must explicitly specify the background: true parameter (note: for MongoDB versions earlier than 4.2).
When MongoDB performs the default foreground index creation, it acquires an exclusive lock at the collection or even the database level. During the build, all read and write requests on the target collection are completely blocked. By enabling the background: true mode, the database alternately processes the index build task and business read/write requests in the background, thereby eliminating global lock blocking and waiting, and ensuring high availability of the system.
Note:
As of MongoDB 4.2, all index builds use an optimized non-blocking mechanism by default. The background parameter is deprecated but can still be parsed for compatibility.
Business Application: An e-commerce DBA created a single-field index on an order table with tens of millions of records during daytime business hours.
Problem: When executing db.t_orders.createIndex({ customerId: 1 }), the background parameter was not specified. MongoDB 4.0 creates indexes in foreground mode by default, and acquiring the exclusive lock blocked all read and write operations on the entire order database for 40 minutes, directly causing a serious incident during a major promotion.
Optimization: After using the background: true background mode to create the index, the business operated normally during the build with only a slight performance drop. Multiple indexes can be combined into a single batch build operation to further reduce I/O overhead.
Background Index Creation Example: The following example demonstrates the blocking risk of foreground index creation and the correct usage of the background mode.
// Error: The background parameter is not specified, which may block all operations.
db.t_orders.createIndex({ customerId: 1 })

// Single creation: Enable the background mode to avoid blocking business operations.
db.t_orders.createIndex(
{ customerId: 1, createTime: -1 },
{ background: true }
)

Specification 2: Controlling the Number of Indexes

Core Action: Limit the index scale for a single collection: The recommended number is 10 or fewer, with an absolute hard limit of no more than 20.
When a collection contains too many indexes, the system will experience performance degradation in the following dimensions:
Resource Dimension
Performance Overhead Description
Write Latency (Write I/O)
Each time an Insert, Update, or Delete is executed, the database must synchronously traverse and update the B-Tree structures of all related indexes. The more indexes there are, the more the I/O overhead per write grows linearly.
Storage consumption (Disk)
Each index is an independent data tree and occupies a large amount of physical disk space.
Business Application: To satisfy various queries, an e-commerce order table accumulated as many as 25 indexes.
Problem: For every new order generated, the database needed to synchronously update 25 index trees, causing the total index size of the table (120 GB) to actually exceed the data itself (50 GB), and the write latency of a single order rose continuously from 5 ms to 50 ms.
Optimization: After investigation, 15 historical report indexes with 0 calls and prefix-duplicate indexes were deleted directly. The write latency immediately dropped back to the normal level of 5 ms, while approximately 70 GB of disk and memory space was freed.
Confirm Index Quantity Example: The following example demonstrates how to investigate and clean up redundant indexes.
// Step 1: View all current indexes of the collection.
db.t_orders.getIndexes()

// Step 2: Investigate the actual usage of each index using $indexStats.
db.t_orders.aggregate([{ $indexStats: {} }]).forEach(function(idx) {
print(
"Index Name:", idx.name,
"| Invocation Count:", idx.accesses.ops,
"| Last Used:", idx.accesses.since
)
})

// Step 3: View the storage ratio between indexes and data.
var stats = db.t_orders.stats()
print("Data Size:", (stats.size / 1024 / 1024).toFixed(2), "MB")
print("Index Size:", (stats.totalIndexSize / 1024 / 1024).toFixed(2), "MB")
print("Index/Data Ratio:", (stats.totalIndexSize / stats.size * 100).toFixed(1), "%")

// Step 4: Delete redundant indexes with an invocation count of 0.
// Before performing the operation, ensure that the index has no business dependencies.
db.t_orders.dropIndex("idx_legacy_report_1")
db.t_orders.dropIndex("idx_legacy_report_2")

// Note: Do not delete the default _id index.
// db.t_orders.dropIndex("_id") // This operation will cause an error because the _id index cannot be deleted.

Specification 3: Safe Cleanup of Redundant Indexes

Core Action: Establish a regular inspection mechanism, use the $indexStats pipeline operator to identify unused indexes with zero calls, and combine the Hidden Index feature to perform safe decommissioning.
$indexStats can accurately capture the real call frequency of an index. Combined with the hideIndex feature introduced in MongoDB 4.4 and later, you can apply "logical masking" to a target index (the optimizer no longer uses it, but it is still maintained and updated in the background), and then perform "physical deletion" after confirming there is no business impact, thereby achieving zero-risk resource reclamation.
Business Application: The order collection t_orders of a logistics platform accumulated 23 indexes after 2 years of operation, several of which were obsolete indexes left over from early requirement iterations and redundant indexes with overlapping functionality. Each write operation needed to synchronously update all indexes, causing write latency to gradually rise from the initial 5 ms to 15 ms, with index storage exceeding the data itself.
Problem: Through $indexStats aggregation analysis, it was found that 14 of the 23 indexes had 0 accesses in the last 90 days and were completely useless; in addition, several groups of indexes had prefix overlap (for example, {userId: 1} coexisting with {userId: 1, status: 1}), and low-selectivity indexes could be covered by compound indexes.
Optimization: Each of the 14 zero-access indexes was hidden one by one via hideIndex, and after observing for 7 days and confirming no business impact, they were physically deleted; prefix-overlapping indexes were merged, and the final number of indexes was reduced from 23 to 9. After cleanup, write latency dropped from 15 ms to 5 ms (an improvement of about 30%), and index storage shrank from 18 GB to 7 GB (a saving of about 40%).
Safe Cleanup Code Example: The following example demonstrates the complete cleanup process for redundant indexes—first identify zero-access indexes via $indexStats, then logically hide them with hideIndex and observe, and finally physically delete them after confirming no business impact.
// Step 1: Execute a statistical aggregation to identify unused indexes.
db.t_orders.aggregate([{ $indexStats: {} }])
/* Output Example Analysis:
{
"name": "idx_old_field",
"accesses": {
"ops": NumberLong(0), // Key metric: The invocation count is 0.
"since": ISODate("2024-01-01T00:00:00Z")
}
}
*/

// Step 2: Logical Hiding (Supports fast rollback, MongoDB 4.4+)
db.t_orders.hideIndex("idx_old_field")

// Observation Period: Continuously monitor for 1-7 days. If business is impacted, you can immediately execute db.t_orders.unhideIndex("idx_old_field") to restore it.

// Step 3: Physical Deletion (Execute after confirming no damage).
db.t_orders.dropIndex("idx_old_field")

Specifications for Using Special Indexes

Specification 4: TTL Indexes for Automatic Cleanup of Expired Data

Core Action: For data with an expiration time, such as logs and sessions, we recommend using TTL (Time-To-Live) indexes to automatically clean up expired data, avoiding manual massive batch deletions executed by scheduled scripts at the application layer.
When using TTL indexes, you must strictly adhere to the following mechanisms and limitations:
Feature / Limitation
Specification
Single-field constraint
A TTL index must be a single-field index. If the TTL field is placed in a compound index, the TTL automatic reclamation feature is directly invalidated.
Data Type
The data type of the target field must be the Date type (or an array containing the Date type); otherwise the engine will not perform cleanup.
Cleanup Delay
Deletion is not precise to the second upon expiration. By default, the TTLMonitor thread polls every 60 seconds, so the physical deletion of data has a minute-level delay.
Engine Parameter Control
If there are performance fluctuations in production, the DBA can further smooth I/O resource consumption by dynamically adjusting ttlMonitorSleepSecs (controls the polling sleep interval) and ttlDeleteBatch (controls the deletion batch size per round). For specific operations, see parameter settings.
Business Application: A logging platform issued the db.logs.remove({createTime: {$lt: a certain timestamp}}) command every day at 2 AM via a scheduled task to delete hundreds of millions of historical logs older than 30 days in bulk. The centralized massive deletion caused severe lock waiting, and the database CPU instantly spiked to 100%. At the same time, the huge amount of Oplog generated caused the cluster's primary-secondary replication lag to reach several minutes, seriously affecting other online business that depended on the cluster. After switching to TTL indexes, MongoDB background tasks automatically and smoothly evicted expired data, completely transparent to the business.
Code Example: Depending on business flexibility, there are usually two TTL index design patterns. Fixed-duration eviction is suitable for scenarios with a unified expiration policy, while precise timestamp eviction is suitable for scenarios that require dynamic control of the lifetime at the document granularity.
// Pattern 1: Fixed-duration expiration (e.g., based on the createTime field, expires after 30 days)
db.t_sessions.createIndex(
{ createTime: 1 },
{ expireAfterSeconds: 2592000, background: true }
)

// Pattern 2: Precise timestamp expiration (Recommended, where the business layer dynamically determines the lifetime)
// Set expireAfterSeconds to 0. The engine will perform cleanup when the expireAt timestamp is reached.
db.t_sessions.createIndex(
{ expireAt: 1 },
{ expireAfterSeconds: 0, background: true }
)

// When inserting at the business layer, directly specify the precise expiration time.
db.t_sessions.insertOne({
sessionId: "sess_001",
userId: "user_001",
expireAt: new Date(Date.now() + 3600000) // Dynamically specifies expiration after 1 hour
})

Specification 5: Efficient Storage and Querying of Time-Series Data with Time-Series Indexes

Core Action: For data continuously written over time, such as IoT sensors, system monitoring, and log collection, we recommend using the Time Series Collection provided by MongoDB 5.0 and later instead of an ordinary collection. By automatically bucketing by time and applying columnar aggregation of fields within buckets, you get a higher compression ratio and better range query efficiency, avoiding the need to manually maintain timestamp indexes and data archiving logic on an ordinary collection.
When using time series collections, you must strictly adhere to the following mechanisms and limitations:
Feature / Restriction
Specification
Required parameter timeField
When a time series collection is created, you must specify the timeField. The value of this field must be of the Date type and is used to timestamp each measurement. MongoDB automatically performs bucket storage based on this.
Recommended parameter metaField
The metaField is used to identify the data source (such as device ID and sensor number). MongoDB partitions data based on this. Select an identifier that rarely changes and avoid using the array type.
Granularity setting granularity
Select the granularity based on the time interval between adjacent measurements from the same data source: seconds (bucket span: 1 hour), minutes (bucket span: 24 hours), hours (bucket span: 30 days). Too coarse a granularity leads to excessive data per bucket and slower queries, while too fine a granularity causes a surge in the number of buckets and wasted storage.
Automatic indexing
MongoDB 6.0+ automatically creates secondary indexes on the timeField and metaField. For earlier versions, you must create them manually.
Unsupported operations
Time series collections do not support efficient execution of distinct() (use $group aggregation as an alternative). Versions earlier than 8.0 do not support zone sharding (Zone Sharding). Once defined, the metaField cannot be changed to another field.
Business Application: An IoT platform received tens of thousands of sensor data points per second. After storing them in an ordinary collection, the data volume exceeded 1 billion records in 3 months. Querying the temperature trend of the last 24 hours by time range required scanning tens of millions of documents, with P99 latency exceeding 5s and storage continuously expanding. After migrating to a time series collection, MongoDB automatically partitioned by metaField (device ID) and bucketed and compressed by timeField, reducing storage by about 70%, lowering the latency of the same query to under 200 ms, and eliminating the need to maintain additional data archiving scripts.
Code Example: Choosing the appropriate granularity parameter based on the data collection frequency is key to optimizing the performance of a time series collection.
// Recommendation: Create a time series collection (sensors report every 5 minutes)
db.createCollection("t_sensor_data", {
timeseries: {
timeField: "timestamp", // Required: Timestamp field
metaField: "sensorId", // Recommended: Data source identifier
granularity: "minutes" // Granularity matches the collection frequency (5 minutes → minutes)
}
})

// Recommendation: Create a time series collection (for system monitoring that collects data every second)
db.createCollection("t_metrics", {
timeseries: {
timeField: "ts",
metaField: "metadata", // metadata can be an object, such as { host: "web01", region: "bj" }
granularity: "seconds" // Granularity matches the collection frequency (second-level → seconds)
}
})

// Insert time series data
db.t_sensor_data.insertMany([
{ sensorId: "sensor_001", timestamp: new Date(), temperature: 23.5, humidity: 65 },
{ sensorId: "sensor_001", timestamp: new Date(), temperature: 23.6, humidity: 64 },
{ sensorId: "sensor_002", timestamp: new Date(), temperature: 18.2, humidity: 72 }
], { ordered: false }) // ordered: false improves batch write performance

// Error: Manually managing time series data on a regular collection
db.createCollection("t_sensor_data_old")
db.t_sensor_data_old.createIndex({ sensorId: 1, timestamp: -1 })
// Requires manual handling of data archiving, compression, and cleanup, resulting in high maintenance costs.

// Query data for a specific device from the last 24 hours (automatically using time series indexes)
db.t_sensor_data.find({
sensorId: "sensor_001",
timestamp: { $gte: new Date(Date.now() - 86400000) }
}).sort({ timestamp: -1 })

// Use $group aggregation instead of distinct() (time series collections do not support efficient distinct operations)
db.t_sensor_data.aggregate([
{ $match: { timestamp: { $gte: new Date(Date.now() - 86400000) } } },
{ $group: { _id: "$sensorId" } }
])

Sorting and Memory Limitations

Specification 6: Include Sort Fields in Indexes

Core Action: All fields used in sorting operations must be included in the index. In-memory sorting is prohibited.
MongoDB performs sorting in two ways—index sorting and in-memory sorting. When the sort field is covered by an index, the database returns results directly according to the ordered structure of the index, with no additional sorting computation; when the sort field is not indexed, the database must load all matching documents into memory to sort them (the SORT stage), consuming a large amount of memory resources.
Memory limit: When the data volume of in-memory sorting exceeds the version limit (32 MB for 4.2 and earlier, 100 MB for 4.4 and later), the query fails with an error directly.
Index design:In compound query scenarios, the sort field should be combined with the query condition fields into a compound index, and the direction of the sort field (ascending/descending) must match the index definition, ensuring the database uses the index to complete both filtering and sorting.
MongoDB Version
Default Memory Limit
Consequence of Exceeding Limit
4.2 and earlier
32MB
An error is reported, and the query fails.
4.4+
100MB
An error is reported, and the query fails.
Business Application: The "sort by transaction amount in descending order" report in the operations backend of an e-commerce platform had the underlying query db.t_orders.find({ status: "paid" }).sort({ amount: -1 }), involving tens of millions of order records.
Problem: After the order volume exceeded 8 million records, the sort data volume exceeded 32 MB (the MongoDB 4.2 limit), and the query failed with the error Sort operation used more than the maximum 33554432 bytes of RAM. The report function became unavailable, affecting operational decisions for 2 days.
Optimization: Through explain("executionStats"), it was confirmed that the execution plan had a SORT stage and that the sort field amount lacked an index, triggering in-memory sorting. A { status: 1, amount: -1 } compound index was created, with status used for filtering and amount leveraging the index order to complete sorting. The SORT stage was eliminated, and the response time dropped from a timeout to 200 ms.
Code Example: The following example demonstrates how to create a compound index for sort fields and verify, via explain("executionStats"), whether the SORT stage is eliminated in the execution plan.
// Error: The sort field is not indexed, which may trigger in-memory sorting.
db.t_orders.find({ status: "paid" }).sort({ createTime: -1 })

// Correct: The sort field is included in the index.
db.t_orders.createIndex({ status: 1, createTime: -1 }, { background: true })
db.t_orders.find({ status: "paid" }).sort({ createTime: -1 }) // Index sorting, no memory limit

// Verify whether index sorting is used.
db.t_orders.find({ status: "paid" }).sort({ createTime: -1 }).explain("executionStats")
// Check whether a SORT stage exists. The absence of a SORT stage indicates that index sorting is used.

Avoiding Inefficient Operators

Specification 7: Avoiding Operators That Cannot Utilize Indexes

Core Action: Avoid using inefficient operators such as $ne, $nin, unprefixed $regex, and $where.
Cost of Negation Operators: $ne and $nin are negation conditions. The database cannot use an index to directly locate the target documents; instead, it must scan all non-matching entries in the index and exclude them one by one, which essentially degrades into a full index scan or even a collection scan.
Cost of regular expressions and scripts:: A $regex without prefix anchoring cannot leverage the ordered nature of the index for range lookups, and the performance degrades more severely as the data volume grows.
Optimization Principle: Convert negation conditions to positive conditions ($ne → $in), change fuzzy matching to prefix anchoring, and rewrite $where into standard query operators.
Inefficient operators and optimization solutions: The following table lists common operators that cannot effectively use indexes and their corresponding optimization alternatives.
Operator
Issue
Optimization Solution
$ne
Requires scanning all non-matching values.
Use $in to list valid values.
$nin
Requires scanning all values not in the list.
Use $in for positive matching.
$not
Usually cannot utilize indexes.
Use positive conditions.
$regex (without prefix)
Prefix without anchoring cannot utilize indexes.
Use prefix anchoring /^prefix/.
$where
JavaScript execution is extremely slow.
Use standard query operators.
$exists: false
Requires scanning all documents.
Use sparse indexes or redesign.
Business Application: A membership management system queried the list of "non-VIP users" with the underlying query db.t_users.find({ level: { $ne: "VIP" } }), with a total of about 5 million users.
Problem: The $ne operator could not effectively use the index on the level field, and each query triggered a collection scan. The database CPU ran continuously at a high level, P99 latency exceeded 2s, and pages loaded slowly.
Optimization: The negation condition was changed to positive enumeration { level: { $in: ["Regular", "Silver", "Gold"] } }, and the query directly hit the index, dropping P99 latency from 2s to 40 ms.
Code Example: The following example compares the syntax differences between negation operators and positive operators, as well as prefix-anchoring optimization for regular expressions.
// Inefficient: $ne cannot be effectively utilized by indexes.
db.t_orders.find({ status: { $ne: "cancelled" } })

// Optimized: $in lists all valid statuses.
db.t_orders.find({ status: { $in: ["pending", "paid", "shipped", "completed"] } })

// Inefficient: Unprefixed regex, full table scan.
db.t_users.find({ name: /Zhang/ })

// Optimized: Prefix-anchored regex can utilize indexes.
db.t_users.find({ name: /^Zhang/ })

Index Design Checklist

Pre-Launch Checklist

Check Item
Verification Method
Passing Criteria
All queries hit indexes.
explain("executionStats")
The stage is IXSCAN, with no COLLSCAN.
No in-memory sorting
explain("executionStats")
No SORT Stage
Reasonable scan efficiency
totalDocsExamined / nReturned
A ratio close to 1
Compound indexes follow ESR.
Reviewing the order of index fields
Equality → Sort → Range
Controllable number of indexes
db.collection.getIndexes()
≤ 10
Creating indexes using background mode
Reviewing the index creation command
Including background: true

Periodic Inspection

Check Item
Verification Method
Recommended Action
Unused index cleanup
$indexStats
Delete indexes with ops: 0 after evaluation.
Index utilization
$indexStats
Consider deleting indexes with low utilization.
Index size
db.collection.stats().indexSizes
Analyze abnormally large indexes.

Index Optimization Recommendations

When the index strategy of a collection is difficult to judge manually, we recommend using the Index Recommendation feature of Database Smart DBA (DBbrain) to assist decision-making. Based on actual slow query logs and query patterns, DBbrain automatically analyzes and recommends the optimal index solution, helping identify missing and redundant indexes. For specific operations, see Index Recommendation.

FAQs

Q1: Why Are Queries Still Slow After Indexes Are Created?

1. Confirm whether the query hits the index.
db.t_orders.find({ status: "paid" }).explain("executionStats")
2. Check scan efficiency: Analyze the query execution plan using the explain() method, and check the index hit efficiency in the returned results.
// Key metric, close to 1: Index is efficient, scanning results in a hit.
totalDocsExamined / nReturned ≈ 1 // Ideal value
3. Check for in-memory sorting: In the returned results of explain("executionStats"), traverse the executionStages layer by layer to examine the stage field:
// Locate the sorting stage in the explain results.
executionStats.executionStages.stage
// Or nested within inputStage / inputStages.
stage Value
Description
Optimization Required or Not
SORT
In-memory sorting, not utilizing index orderliness
Optimization required
SORT_KEY_GENERATOR
Extracting the sort key, occurring in conjunction with SORT.
Optimization required
No `SORT` Stage
Sorting is completed by the inherent orderliness of the index.
Optimization not required
4. Check whether the order of index fields complies with the ESR principle.
// Does not comply with ESR: range field precedes sorting field.
db.t_orders.createIndex({ createTime: 1, status: 1 })
db.t_orders.find({ createTime: { $gte: ISODate("2024-01-01") }, status: "paid" }).sort({ amount: -1 })

// Complies with ESR: equality → sorting → range
db.t_orders.createIndex({ status: 1, amount: -1, createTime: 1 })


Q2: Which Is Better: Composite Indexes or Multiple Single-Field Indexes?

In most scenarios, a compound index outperforms multiple single-field indexes.
Comparison Dimension
Multiple Single-Field Indexes
Compound Index
Indexing scheme
{ status: 1 } + { userId: 1 } + { createTime: 1 }
{ status: 1, userId: 1, createTime: -1 }
Query process
Generally, only one field in a query uses the index.
A single B-Tree lookup directly locates the target.
Sort
The index cannot cover sorting, potentially triggering an in-memory SORT.
The index is inherently ordered, eliminating the need for in-memory sorting.
Covered query
Not achievable, requiring a table lookup to fetch the complete document.
If all query fields are included in the index, the results can be returned directly without requiring a table lookup.
Memory overhead
Higher overall memory overhead
No additional overhead.
// Multiple single-field indexes: MongoDB attempts index intersection, resulting in unstable efficiency.
db.t_orders.createIndex({ status: 1 })
db.t_orders.createIndex({ userId: 1 })
db.t_orders.createIndex({ createTime: 1 })

// Compound index: a single index covers both querying and sorting, adhering to the ESR principle.
db.t_orders.createIndex({ status: 1, userId: 1, createTime: -1 })

// Single-field indexes offer greater flexibility when query patterns are not fixed and field combinations vary:
// Scenario: For dynamic filtering in an operations backend, where users may query based on any combination of fields, creating a compound index for each combination is impractical. Single-field indexes + index intersection is the appropriate choice.
db.t_orders.find({ status: "paid" }) // Filters only by status
db.t_orders.find({ userId: "u_10001" }) // Filters only by user
db.t_orders.find({ createTime: { $gte: ISODate("...") } }) // Filters only by time
db.t_orders.find({ status: "paid", userId: "u_10001" }) // Filters by status + user

Q3: How to Safely Create Indexes for Large Collections?

1. Select an off-peak period: Perform index building during a service off-peak period to minimize the impact on online read and write operations.
2. Disable the Balancer in a sharded cluster: For MongoDB versions earlier than 4.4, disable the Balancer before indexes are built to prevent concurrent conflicts between data migration and index construction.
sh.stopBalancer()
3. Perform background index creation: add the { background: true } option (applicable only to MongoDB versions 4.2 and earlier; in versions 4.2+ index building does not block reads and writes by default).
db.orders.createIndex({ userId: 1, createTime: -1 }, { background: true })
4. Monitor index build progress and resources: Continuously observe CPU, memory, and disk IO usage, and terminate the build if necessary.
Note:
Do not delete indexes on the same collection during index building: MongoDB versions earlier than 4.4 use a serial index building mechanism. If a secondary node has not completed index building, triggering an index deletion operation (such as dropIndex) on the same collection at this time may cause replication interruption or even unavailability on the secondary node. During index building, freeze all index modification operations on that collection. Wait until currentOp returns empty, confirming the build is complete, before proceeding with any operations.
// View currently running index build tasks.
db.currentOp({ "command.createIndexes": { $exists: true } })
5. Estimated time: Building an index on a large collection may take several hours. Evaluate this in advance and reserve a maintenance window accordingly.

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック