Overview
This article describes, how to change the size of indexed FileData per file when using DtSearch as an extractor.NOTE ! This document is valid for M-Files 21.2.9928.0 and beyond.
Extraction, cache and index size
Text extraction means that the system extracts unique text contents from the document. After that, it will store the extracted "raw-text" into a cache and into IDOL.
The cache is a folder structure under the FileData location. The meaning of cache is to speed up the indexing operation in the situation where only metadata is changed. This is because we always have to store both MetaData and FileData.
The size of the index is dependent on the amount of extracted text and how much we allow the system to store that to the search engine. Because IDOL and Smart Search are designed to search large number of documents with high performance, it is not intended to extract everything from the FileData. We should also remember the cache, which will use the space from the FileData disk.
Defaults and recommendations
By default, maximum of 2 Mb is extracted from the document. The maximum recommended size is 5 Mb if necessary. This is the amount that is saved into the cache.
After extraction, the system will save the data into the search engine. By default 100 Kb is saved to IDOL. That seems to be adequate for most of the customers. The maximum recommended size is 5 Mb if necessary.
Note! If you are using Smart Search as a search engine, the maximum amount to save to the search engine is always 1 Mb. Anything above will be discarded.
Settings
The settings are in the admin tool's Advanced Vault Settings. They only affect to the new and edited files. Changing them does not trigger re-indexing.
Maximum Plain text Length (bytes)
Location: Configuration > File Previews > Viewer Files
Default value: 2097152
Maximum recommended value: 5000000
Description: How much data is extracted from a file. This is also the amount of data to be cached.
Maximum Length for Single File Content (kilobytes)
Location: Configuration > Search > Indexes > [index] > Additional Options > Limits for Indexed Data
Default value: 100
Maximum recommended: 5000
Description: How much of extracted data is saved to the index.
