iim - how it works

terminology

A remote archive is some file tree on a remote server ; it has directories containing files and symlinks (links).

local archive

A local archive is some file tree on the local server.

events

An event is a (epoch, path, type) tuple, where

epoch is an element of a totally ordered set EPOCH ;
- two events that have the same epoch must be equal ;
- the epochs look like microsecond timestamps, but they are just ordered tags ;
path represents a file-path in the remote (and local) file tree ;
type ∈ { new, delete }

Each event represents an update in the remote archive ; the epoch is used to order the events.

The event stream is the ordered collection of all the events. As time progresses, the event stream is extended.

RECENT files

The RECENT files (RECENTs for short) are the Recent-.*.json files.

To visualize, imagine time progressing from left to right (right is recent), as is usual.
Each RECENT file contains, among other things, a non-empty substring of the event-stream.
All RECENTs together contain the entire event-stream.
The RECENTs are ordered ( Z, 1Y, 1Q, ..., 1d, 6h, 1h ).
If two subsequent RECENTs A and B overlap, the overlap is a suffix of A and a prefix of B.
If two subsequent RECENTs A and B don't overlap, all epochs (of the events) in A are smaller than all epochs (of the events) in B.
In a RECENT file, the events are in a list where the first item in the list is the last event, and the last item in the list is the first event.
Notes
- opqrs is a substring of string ABC == abcdefghijklmnopqrstuvwxyz ; abc is a prefix of ABC ; xyz is a suffix of ABC ; the empty string is a substring, prefix and suffix of any string.
- One would like to specify that the (subsequent) RECENTs partition the event-stream into (subsequent) non-empty substrings, but this is impossible to maintain efficiently.
- Because updating the RECENTs is event-driven, RECENTs aren't updated when there are no new events.
- For a given path, only the last path-event is relevant ; previous events pertaining to path can be removed from history. So, if a file is removed from the archive, the delete event can't be removed from history, ever.

consistency

An archive is consistent if all the evens in the RECENTs in the archive have been applied on the archive. This means that a fresh RECENT (obtained from the remote archive) may only be moved into local archive, if/when all the events in the RECENT have been succesfully applied on the local archive.

epoch

The epoch of a (consistent) archive is the epoch of the last event in the last RECENT. The epoch of the archive is undefined if at least one RECENT is missing.

how it works

To keep the local archive in sync, RECENT files must be fetched ; the events therein must be applied as updates (fetches or deletes) on the local archive.

data

In Iim, 3 data areas are relevant :

remote : the archive on the remote server
local : the local archive
temp : a staging area for RECENTs

Fresh RECENTs have to be kept in a staging area ; they can only be moved to area local if/when all events in the RECENTs have been processed.

the program

The basic structure of iim is simple:

  todo = []
  epoch <- initialize
  LOOP
    FETCH fresh recents
    epoch <- FIND new events
    todo  <- PROCESS new events, todo
    IF todo == [] MOVE recents INTO local

Array todo is used to carry unprocessed or failed events from one loop to the next.

initialize

The main object of initialize is to determine epoch from the epoch of the local archive. Other initializations follow from the loop-invariants we need to maintain.

If epoch is undefined or differs too much from the the epoch of remote, a full rsync is done first.

At the moment, too much is defined as : more than (the number of seconds in) two days ; the choice is arbitrary.

fetching RECENTs

To fetch (all) RECENTs, we sync them into area temp, marking their inode number. After the sync the RECENTs with a changed inode number are fresh.

This is very efficient on the client and the server.

On startup, we must initialize temp with hardlinks to the RECENTs in local.

finding new events

New events (more recent than epoch) are looked up in the (ordered set of) RECENTs in area temp. If new events are found, epoch is set to the epoch of the last event.

processing events - fetching files

When processing (a batch of) events, we must fetch (a batch of) files from remote. We want to rsync directly into local to take advantage of rsync's matched data magic ; if the source and destination files differ only a little, the transfer is very efficient.

First we determine the last event for each event-path in the batch ; if it is a delete event, we just delete the event-path from local ; if it is a new event, we add the path to the fetch-list, marking the inode number (possibly undef) of the path in local. Then we rsync the files from remote into local.

When rsync is finished, we determine which items on the fetch-list have made it into area sync. If the rsync had no errors, or the inode number of the file changed, the event is done. If the rsync was partial (rsync exit codes 23, 24), the tried counter of the even is incremented. The event is pushed on the todo list.

When an event's tried counter is over a certain limit (currently 3), the event is retried in a one-file-rsync. If the rsync succeeds, the event is considered done. If the rsync resulted in a partial transfer with number of files: 0, iim concludes that the event-path does not exist in remote, and the event is discarded as a bogus new event. Otherwise the event is put on the todo list (again).

Why worry about bogus new events ? Suppose that a bogus new event would appear in the event-list (because of a mistake in iim and/or the RECENTs). Then the event would sit in the todo list forever. As a consequence, no fresh RECENTs would ever be moved to local, because that's only allowed if the todo list is empty. That's why we make sure that bogus new events are handled properly, although the need may very well never arise.

moving RECENTs into `local`

Once all events in the RECENTs in temp have been processed, (todo is empty) the fresh Recents (having link count 1) in area temp may be moved to area local.

We move a fresh RECENT R (without copying) like this :

  unlink local/R
  link temp/R local/R

Aside: my Perl (v5.8.8 built for x86_64-linux-thread-multi) has a problem with renames ; if a, b and c are all hard links to the same thing, a rename 'a', 'b' returns 1, but has not renamed a. The rename can fail (according to the doc), but it should not return 1. Program /bin/mv does the right thing. I haven't tried rename(2).

fluff

That's it, basicly ; the rest is bells and whistles : items or features that are useful or decorative but not essential.

design goals

Iim should be (in order) : robust, efficient, simple. It means that we will always sacrifice efficiency and simplicity for robustness, and (within reason) simplicity for efficiency.

improvements

Before looking at improvements, let's state some assumptions.

Setting up an (rsync) connection to a remote server is expensive. The client should minimize the number of connections it makes per loop-run.
Rsyncing an unchanged file is cheap, because unchanged files are skipped in the rsync transfer. For the server and client it is cheap to determine if a file is unchanged (usually by comparing size/date).

For instance, suppose we have a file area containing recent versions of the RECENTs. Now, rsyncing all RECENTs is just as expensive as rsyncing only RECENT-1h.json, because most of the RECENTs will be unchanged. Those that are changed will most likely be needed sooner or later anyway.

improvements - using a cache

Suppose CPAN provides a file RECENT-cache, containing a list of filenames : RECENT-*, plus all the paths named in the events in (say) the past hour for which the path exists in the archive. Since some path can be mentioned in more than one event, RECENT-cache should only contain paths for which the last corresponding event is a new event.

Each loop begins by refreshing the cache (temp) :

  rsync --files-from MASTER::RECENT-cache MASTER::/ LOCAL/temp

Then, the client proceeds as before : it inspects the RECENTs for new events and creates a list of items to fetch. However, it can exclude items that are available in the cache. Normally (if/when the client is in sync), all required items will be in the cache, so a second rsync will not be necessary.

Refreshing the cache will be cheap, because most items will be unchanged.

Cache maintenance is easy : the cache always contains the most recent RECENT-cache ; so, after a refresh of the cache, the client can weed out obsolete entries from the cache.

improvements - using a truncated event stream

Assume for a moment that it is acceptable for a client to do a full sync if/when it finds itself out-of-sync for over a month (either on start-up or because it is suspended, or loses connection for more than a month, whatever).

Now we can drop from the server RECENT-{1Q,1Y,Z}.json, and set 'merged' to 'null' in RECENT-1M.json ; we 'forget' all events older than one month. Note that the epoch of an archive is still always defined, because the (truncated) history will never become empty, even when the master freezes for a very long time.

Truncating history doesn't matter to a client that is reasonably in sync. So, suppose that the client has slept for two months and wakes up. As usual, it syncs the remote RECENTs, and start looking for new, unseen events. if (and only if) the client runs out of history, it does a full rsync ; this is ok (by assumption).

Only when the master becomes available again, after being completely unreachable for over a month, would all clients start requesting full rsyncs. This scenario is very unlikely.

If one takes the view that full rsyncs may never be used, the price to pay is having to maintain (and use) the (ever growing) full history forever ; and forever is a very long time.

improvements - replace the RECENTs with database and a daemon

At the moment, the RECENTs are generated by an agent that watches the CPAN file-tree and generates (epoch,path,type)-events. Suppose the agent can output the events to some stream.

For now we assume the event-generator can send (type,path)-tuples over a socket to some daemon ; if/when the connection to the daemon fails, the event-generator logs the tuples to some file. On reconnect, it moves the file, sends the backlog to the daemon, then removes the file.

Suppose the event-stream is maintained (by some daemon) in an sqlite database with (id,type,path) tuples.

id
id is just an auto-increment counter, starting at 1.
type
type is the event's type : either new or delete.
path
path is the event's path. There is only one row per path.

Column path has a UNIQUE constraint with an ON CONFLICT REPLACE clause, so only the last event regarding path is stored. New events can be inserted without further housekeeping.

In a delete directory-event, the path ends with a '/'. Because rsync can't replace a non-empty directory with file (or symlink), the client must cleanup the directory, before running rsync.

Now this works :
```
  3000 delete some-path/
  4000 upd    some-path  # some-path is a file
```
Without the trailing slash, event 3000 would be lost (replaced by 4000).

For diagnosic purposes, a time column could be added, initialised as int(epoch).

Next, we need a daemon, which does two things.

the daemon maintains the database
The daemon adds new events to the database.

A root-daemon accepts authorised requests from a the event-generator ; like
```
  add $path $type $sum
```
where $sum is the checksum of ``$path.$typ.$secret'', and $secret is shared between the event-generator and the daemon.

On a downstream site, events are added by the client after the event is processed by the client.
the daemon ansers queries
On request, the daemon answers to a limited set of queries like
```
  first_id
  last_id
  from $x
  from $x to $y
  from $x limit $n
  path $path
  stat $path
  lstat $path
  readlink $path
```
where $x and $y refer to id's. The result of a query is a proper json file.

Query path returns the event for $path, if any. Query stat returns stat /path/to/cpan/$path ; likewise lstat and readlink.

the client side

The proposed daemon would make an ``instant mirroring'' client, simpler and more efficient.

Instead of gleaning events from the RECENTS, a client can query the daemon for new events. It only has to remember the id of the last processed event.

  $last_id = get_local_last_id || 0 ;

  LOOP :
    { push @events, query_for_events_after $last_id ;
      $last_id = process @events ;
      commit_local $last_id ;
    }

At the moment, a client must be very careful in handling the RECENTS ; that complexity would disappear.
With the remote file queries (stat, lstat, readlink), the client can resolve exceptional problems resulting from missing or bugus events in the event-stream.

benefits

No more RECENTS to generate, maintain and distribute.
The proposed daemon is simple, and can easily be made robust and efficient.
The current event-stream-generator could/should be easy to adapt. If it can't be adapted, the root-daemon can glean events from the RECENTs. The root-daemon would have to remember (between runs) the epoch of the last event committed to the database.