Most (space-time) efficient way to delete files in the background on Unix like operating systems.
Deleting Items from Filesystems on Unix like systems traditionally requires that one has to recurse into each sub-directory and unlink each entry. This has some drawbacks.
Summary
Plan
** Optimizations/Notes
The Database could grow excessively large, only add files with a size over some configured threshold to it.
While moving data into the 'rmrfd' directories for asynchronous deletion is simple, it may be not sufficient for some use cases. For this a simple API exits.
** Remove 'path' from the filesystem
The 'sync' argument can be one of the following: * -1 :: Asynchronous deletion. The function will return immediately. * 0 :: Synchronous deletion. Return as soon the size to be freed (inventory created) is known. This is useful when the caller only needs to know how much space eventually will be freed. * 1..100 :: Synchronous deletion. Return when as much percent of the space is freed. With this the caller can block until space becomes really available. Due to the nature how filesystems store data this will be inaccurate and the caller has to put more safeguards into place. Being able to limit this by some percentage allows for reasonably fast return while the bulk of slow deletions may still progress in the background.
* Return * 0 :: when asynchronous removal was requested and accepted. * Number of (1k) blocks it will free :: on synchronous removal. * an error code (negative number) :: when anything got wrong.
* Implementation details
This API is a library that operates in the caller context. It connects to the 'rmrfd' over a local socket. Messages between the library and the 'rmrfd' are only informal. The movement of the data into the 'rmrfd' directory will be done by the API itself, thus there is no worry about security implications.
** Protocol
The API opens a session to the daemon for each call, after that a Request/Response textual protocol (with nul terminators) is used. In case of any Error the session ends. Protocol examples are given below for the successful cases, while any request can as well fail with an error number ~ERR nnn\0~.
Query for a given path which 'rmrf' directory to use. There must be an existing 'rmrf' directory on the same filesysystem as the to be deleted object. Further as safeguard this directory must be either on the same directory level or above. Thus with proper placement of 'rmrf' directories one has some limited control over what could be deleted.
Send: PATH /foo/bar/baz\0 Receive: OK /foo/bar/.rmrf/$TMPDIR/\0
Note that the rmrfd reserves and returns a temporary directory for the operations to prevent name collisions.
Move the to be deleted data into the returned temporary directory
In case this fails for some reason the session can just be terminated by closing the fd.
Set the sync policy, start deleting
Send: SYNC 85\0 Receive: OK 12345678\0 // return freed size after a while
A simple commandline utility 'rmrf' that calls above API can be implemented.