SPE1.1.0.2

FILE ATTRIBUTESSparse file. Reparce point. Offline.

CONTENTS

DESCRIPTION

File attributes are a type of meta-data that describe and may modify how files and/or directories in a filesystem behave.

Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypted.

The availability of most file attributes depends on support by the underlying filesystem (such as FAT, NTFS, ext4) where attribute data must be stored along with other control structures.

Each attribute can have one of two states: set and cleared.

Attributes are considered distinct from other metadata, such as dates and times, filename extensions or file system permissions. In addition to files, folders, volumes and other file system objects may have attributes.

SPARSE FILES

Sparse Files are a type of computer file that allows for efficient storage allocation for large data.

A file is considered to be sparse when much of its data is zero (empty data). Support for the creation of such files is generally provided by the File system.

This type of file is used significantly in computer science areas such as DBMS (Database Management Systems), Digital Image Processing, etc.

Sparse files are created differently than a normal (non-empty) file. Whenever a sparse file is created metadata representing the empty blocks (bytes) of disks is written to the disk, rather than the actual bytes which make up block, using less disk space. This is because empty bytes don’t need to be saved, thus they can be represented by metadata. Actual data blocks are only written when any non-empty (zero) data is written to the file. When reading sparse files, the file system transparently converts metadata representing empty blocks into “real” blocks filled with null bytes at runtime. The application is unaware of this conversion as conversion happens at the file system level. A sparse file need not be totally filled with null data, rather certain empty sections of a file could also be flagged as sparse. The data still follows the aforementioned mechanism, but on a smaller scale.

Advantages of sparse files

A large amount of storage space can be allocated without physically writing any sectors, and therefore allows for faster file creation.

Allocation occurs only when non-empty data is written, therefore disk space is saved.

Since the logical space of sparse files is more than allocated space, therefore more data can be read then allocated.

If the initial allocation requires writing all zeros to space, then no actual allocation occurs thus preventing unnecessary disk read-writes.

On files which aren’t completely sparse it reduces time of first write as system doesn’t have to allocate blocks for “skipped” space.

In certain scenarios is better then file compression.

Disadvantages of sparse files

Most file copy operations destroy the sparse properties the file. Therefore, sparse regions of file are explicitly allocated on disk, losing their sparse properties.

Since logical size of file can be greater then their allocated size, file system free space reports may not be correct.

Several applications do not work efficiently with sparse files.

Sparse files may become fragmented overtime with valid data writes.

REPARSE POINTS

A file or directory can contain a reparse point, which is a collection of user-defined data.

The format of this data is understood by the application which stores the data, and a file system filter, which you install to interpret the data and process the file. When an application sets a reparse point, it stores this data, plus a reparse tag, which uniquely identifies the data it is storing. When the file system opens a file with a reparse point, it attempts to find the file system filter associated with the data format identified by the reparse tag. If a file system filter is found, the filter processes the file as directed by the reparse data. If a file system filter is not found, the file open operation fails.

For example, reparse points are used to implement NTFS file system links and the Microsoft Remote Storage Server (RSS). RSS uses an administrator-defined set of rules to move infrequently used files to long term storage, such as tape or optical media. It uses reparse points to store information about the file in the file system. This information is stored in a stub file that contains a reparse point whose data points to the device where the actual file is now located. The file system filter can use this information to retrieve the file.

Reparse points are also used to implement mounted folders (Determining Whether a Directory Is a Mounted Folder). The following restrictions apply to reparse points:

Reparse points can be established for a directory, but the directory must be empty. Otherwise, the NTFS file system fails to establish the reparse point. In addition, you cannot create directories or files in a directory that contains a reparse point.

Reparse points and extended attributes are mutually exclusive. The NTFS file system cannot create a reparse point when the file contains extended attributes, and it cannot create extended attributes on a file that contains a reparse point.

Reparse point data, including the tag and optional GUID, cannot exceed 16 kilobytes. Setting a reparse point fails if the amount of data to be placed in the reparse point exceeds this limit.

There is a limit of 63 reparse points on any given path.

Note: The limit can be reduced depending on the length of the reparse point. For example, if your reparse point targets a fully qualified path, the limit becomes 31. Windows Server 2003 and Windows XP: There is a limit of 31 reparse points on any given path.

OFFLINE FILES

End users might need access to shared folders even when they’re disconnected from your internal network.

Offline Files makes this possible by allowing client computers to automatically cache a copy of files on shared folders and by providing transparent access to the files when the user is disconnected from the network.

The next time the user connects to the network, offline files synchronizes any updates and prompts the user to manually resolve any conflicts.

Server administrators can configure Offline Files at the shared folder, and users of client computers can configure Offline Files when connected to a shared folder.

Offline Files Options

Only The Files And Programs That Users Specify Are Available Offline – Users must manually select the files they want to access while offline. This option works well when users understand how to use Offline Files.

All Files And Programs That Users Open From The Share Are Automatically Available Offline – Files that users access while connected to the network are automatically cached for a limited amount of time. This option works well when users do not understand how to use Offline Files.

No Files Or Programs From The Share Are Available Offline – Prevents users from accessing Offline Files. This option is the best choice for confidential documents that should not be stored on mobile computers.

CONTENT INDEXING

The Content Indexing Engine is the core component for the content indexing and search feature.

It is the underlying integrated software application that provides indexing, searching and filtering services for all data - including file server/desktop data and protected/archived data.

Indexing is the process of looking at files, email messages, and other content on your PC and cataloging their information, such as the words and metadata in them.

When you search your PC after indexing, it looks at an index of terms to find results faster.

When you first run indexing, it can take up to a couple hours to complete. After that, indexing will run in the background on your PC as you use it, only re-indexing updated data.

The content Indexed files by default, are known plain-text file types such as .INI, .SQL, .CSV, .TXT, .JS, .BAT, .CMD, .CPP, .VBS, etc..

By default, all the properties of your files are indexed, including file names and full file paths. For files with text, their contents are indexed to allow you to search for words within the files.

Apps you install may also add their own information to the index to speed up searching. For example, Outlook 2016 adds all emails synced to your machine to the index by default and uses the index for searching within the app.

Many of the built-in apps on your PC use the index in some way. File Explorer, Photos, and Groove all use it to access and track changes to your files. Microsoft Edge uses it to provide browser history results in the address bar. Outlook uses it to search your email. Cortana uses it to provide faster search results from across your PC.

The operation system is constantly tracking changes to files and updating the index with the latest information. To do this, operation system opens recently changed files, looks at the changes, and stores the new information in the index.

All data gathered from indexing is stored locally on the workstation. None of it is sent to any other computer or server.

However, apps you install on the workstation may be able to read the data in the index, so be careful with what you install and make sure you trust the source.

Note: More information on the subject can be found in the official documentation of Content Indexing Services Protocol.

Contents