/* * SPDX-License-Identifier: Apache-2.0 * * The OpenSearch Contributors require contributions made to * this file be licensed under the Apache-2.0 license or a * compatible open source license. */ /* * Licensed to Elasticsearch under one or more contributor * license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright * ownership. Elasticsearch licenses this file to you under * the Apache License, Version 2.0 (the "License"); you may * not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ /* * Modifications Copyright OpenSearch Contributors. See * GitHub history for details. */ package org.opensearch.index; import org.apache.logging.log4j.Logger; import org.apache.lucene.index.MergePolicy; import org.apache.lucene.index.NoMergePolicy; import org.apache.lucene.index.TieredMergePolicy; import org.opensearch.common.settings.Setting; import org.opensearch.common.settings.Setting.Property; import org.opensearch.core.common.unit.ByteSizeUnit; import org.opensearch.core.common.unit.ByteSizeValue; /** * A shard in opensearch is a Lucene index, and a Lucene index is broken * down into segments. Segments are internal storage elements in the index * where the index data is stored, and are immutable up to delete markers. * Segments are, periodically, merged into larger segments to keep the * index size at bay and expunge deletes. * *
* Merges select segments of approximately equal size, subject to an allowed * number of segments per tier. The merge policy is able to merge * non-adjacent segments, and separates how many segments are merged at once from how many * segments are allowed per tier. It also does not over-merge (i.e., cascade merges). * *
* All merge policy settings are dynamic and can be updated on a live index. * The merge policy has the following settings: * *
index.merge.policy.expunge_deletes_allowed
:
*
* When expungeDeletes is called, we only merge away a segment if its delete
* percentage is over this threshold. Default is 10
.
*
* index.merge.policy.floor_segment
:
*
* Segments smaller than this are "rounded up" to this size, i.e. treated as
* equal (floor) size for merge selection. This is to prevent frequent
* flushing of tiny segments, thus preventing a long tail in the index. Default
* is 2mb
.
*
* index.merge.policy.max_merge_at_once
:
*
* Maximum number of segments to be merged at a time during "normal" merging.
* Default is 10
.
*
* index.merge.policy.max_merged_segment
:
*
* Maximum sized segment to produce during normal merging (not explicit
* force merge). This setting is approximate: the estimate of the merged
* segment size is made by summing sizes of to-be-merged segments
* (compensating for percent deleted docs). Default is 5gb
.
*
* index.merge.policy.segments_per_tier
:
*
* Sets the allowed number of segments per tier. Smaller values mean more
* merging but fewer segments. Default is 10
. Note, this value needs to be
* >= than the max_merge_at_once
otherwise you'll force too many merges to
* occur.
*
* index.merge.policy.deletes_pct_allowed
:
*
* Controls the maximum percentage of deleted documents that is tolerated in
* the index. Lower values make the index more space efficient at the
* expense of increased CPU and I/O activity. Values must be between 5
and
* 50
. Default value is 20
.
* * For normal merging, the policy first computes a "budget" of how many * segments are allowed to be in the index. If the index is over-budget, * then the policy sorts segments by decreasing size (proportionally considering percent * deletes), and then finds the least-cost merge. Merge cost is measured by * a combination of the "skew" of the merge (size of largest seg divided by * smallest seg), total merge size and pct deletes reclaimed, so that * merges with lower skew, smaller size and those reclaiming more deletes, * are favored. * *
* If a merge will produce a segment that's larger than
* max_merged_segment
then the policy will merge fewer segments (down to
* 1 at once, if that one has deletions) to keep the segment size under
* budget.
*
*
* Note, this can mean that for large shards that holds many gigabytes of
* data, the default of max_merged_segment
(5gb
) can cause for many
* segments to be in an index, and causing searches to be slower. Use the
* indices segments API to see the segments that an index has, and
* possibly either increase the max_merged_segment
or issue an optimize
* call for the index (try and aim to issue it on a low traffic time).
*
* @opensearch.internal
*/
public final class MergePolicyConfig {
private final OpenSearchTieredMergePolicy mergePolicy = new OpenSearchTieredMergePolicy();
private final Logger logger;
private final boolean mergesEnabled;
public static final double DEFAULT_EXPUNGE_DELETES_ALLOWED = 10d;
public static final ByteSizeValue DEFAULT_FLOOR_SEGMENT = new ByteSizeValue(2, ByteSizeUnit.MB);
public static final int DEFAULT_MAX_MERGE_AT_ONCE = 10;
public static final ByteSizeValue DEFAULT_MAX_MERGED_SEGMENT = new ByteSizeValue(5, ByteSizeUnit.GB);
public static final double DEFAULT_SEGMENTS_PER_TIER = 10.0d;
public static final double DEFAULT_RECLAIM_DELETES_WEIGHT = 2.0d;
public static final double DEFAULT_DELETES_PCT_ALLOWED = 20.0d;
public static final Setting