fsync/syncsync is a standard system call in the Unix operating system, which commits to disk all data in the kernel filesystem buffers,data which has been scheduled for writing via low-level I/O system calls.Higher-level I/O layers such as stdio may maintain separate buffers of their own.The related system call fsync() commits just the buffered data relating to a specified file descriptor.[1] fdatasync() is also available to write out just the changes made to the data in the file, and not necessarily the file's related metadata.
inside a shard
每个被索引的字段都有倒排索引every indexed field in a JSON document has its own inverted index.
查询时,所有的segment会被轮流查询,
segment是不可变的,没法从老的片段删除或添加文档。所以每个commit point有一个.del文件,里面记录了哪个片段的哪个文档被删除了,当文档更新时,老版本的文档被表示删除,新版本的文档索引到新segment里
瓶颈在于磁盘,提交一个new segment到磁盘需要fsync,fsync是昂贵的。在es和磁盘直接的是filesystem cache,new segment先写入filesystemcache,之后在写入磁盘。这个过程叫refresh,分片默认每秒refresh,配置参数refresh_interval,
PUT /my_logs { "settings": { "refresh_interval": "30s" } }这个参数可以动态的修改。可以在建立索引时关闭refresh,使用时打开
PUT /my_logs/_settings { "refresh_interval": -1 } PUT /my_logs/_settings { "refresh_interval": "1s" }full commit: 将在filesystem cache里的segment写入磁盘,commit point。用在失败后恢复commit point lists all known segments,es在启动和重新打卡索引时,通过commit point知道segments属于哪个shards.当full commit时文件改变了怎么办?
translog记录了es发生的每个行为。文档先添加到in-memory buffer, 再添加到translog.refresh的时候,buffer清空,translog不变。The docs in the in-memory buffer are written to a new segment, without an fsync.The segment is opened to make it visible to search.The in-memory buffer is cleared.
flush + create new translog当translog太大或者一定时间后,index is flushed,创建新的translog.
Any docs in the in-memory buffer are written to a new segment.The buffer is cleared.A commit point is written to disk.The filesystem cache is flushed with an fsync.The old translog is deletees启动时,通过last commit point来恢复segments,接着重新执行translog里记录的操作(When starting up, Elasticsearch will use the last commit point to recover known segments from disk, and will then replay all operations in the translog to add the changes that happened after the last commit.)
translog还被用来做实时的CRUD,当需要通过id retrieve, update, or delete a document,会先检查translog有没有更改,再去segment取文档。这就提供了实时访问最新的文档的方式。
full commit and truncating the translog is called flush.分片默认30分钟flush或当translog太大的时候
自动refresh每秒就创建一个segment,每次搜索都会查询每个segment,so,segment越多查询越慢。es会在后台合并segment,小的合并到大的,这个时候那些已经删除的旧的文档就会从文件系统清除。删除的文档和旧版本的修改过的文档不会复制到新的大segment里合并结束之后:
The new segment is flushed to disk.A new commit point is written that includes the new segment and excludes the old, smaller 3. segments.The new segment is opened for search.The old segments are deleted.强制合并的api。强制分片的segment数量小于max_num_segments 参数。不应该在活跃的索引上使用。
POST /logstash-2014-10/_optimize?max_num_segments=1optimize 出发的merge是完全没有限制的,他们可能用掉所有的I/O, If you plan on optimizing an index, you should use shard allocation (see Migrate Old Indices) to first move the index to a node where it is safe to run.
转载于:https://www.cnblogs.com/saihide/p/7827556.html