A remote archive is some file tree on a remote server ; it has directories containing files and symlinks (links).
A local archive is some file tree on the local server.
An event is a (epoch, path, type) tuple, where
epoch is an element of a totally ordered set EPOCH ;
two events that have the same epoch must be equal ;
the epochs look like microsecond timestamps, but they are just ordered tags ;
path represents a file-path in the remote (and local) file tree ;
Each event represents an update in the remote archive ; the epoch is used to order the events.
The event stream is the ordered collection of all the events. As time progresses, the event stream is extended.
The RECENT files (RECENTs for short) are the Recent-.*.json
files.
To visualize, imagine time progressing from left to right (right is recent), as is usual.
Each RECENT file contains, among other things, a non-empty substring of the event-stream.
All RECENTs together contain the entire event-stream.
The RECENTs are ordered ( Z
, 1Y
, 1Q
, ..., 1d
, 6h
, 1h
).
If two subsequent RECENTs A and B overlap, the overlap is a suffix of A and a prefix of B.
If two subsequent RECENTs A and B don't overlap, all epochs (of the events) in A are smaller than all epochs (of the events) in B.
In a RECENT file, the events are in a list where the first item in the list is the last event, and the last item in the list is the first event.
opqrs is a substring of string ABC == abcdefghijklmnopqrstuvwxyz ; abc is a prefix of ABC ; xyz is a suffix of ABC ; the empty string is a substring, prefix and suffix of any string.
One would like to specify that the (subsequent) RECENTs partition the event-stream into (subsequent) non-empty substrings, but this is impossible to maintain efficiently.
Because updating the RECENTs is event-driven, RECENTs aren't updated when there are no new events.
For a given path, only the last path-event is relevant ; previous events pertaining to path can be removed from history. So, if a file is removed from the archive, the delete event can't be removed from history, ever.
An archive is consistent if all the evens in the RECENTs in the archive have been applied on the archive. This means that a fresh RECENT (obtained from the remote archive) may only be moved into local archive, if/when all the events in the RECENT have been succesfully applied on the local archive.
The epoch of a (consistent) archive is the epoch of the last event in the last RECENT. The epoch of the archive is undefined if at least one RECENT is missing.
To keep the local archive in sync, RECENT files must be fetched ; the events therein must be applied as updates (fetches or deletes) on the local archive.
In Iim, 3 data areas are relevant :
remote
: the archive on the remote server
local
: the local archive
temp
: a staging area for RECENTs
Fresh RECENTs have to be kept in a staging area ; they can only be moved
to area local
if/when all events in the RECENTs have been processed.
The basic structure of iim is simple:
todo = [] epoch <- initialize LOOP FETCH fresh recents epoch <- FIND new events todo <- PROCESS new events, todo IF todo == [] MOVE recents INTO local
Array todo
is used to carry unprocessed
or failed events from one loop to the next.
The main object of initialize is to determine epoch from the epoch of the local archive. Other initializations follow from the loop-invariants we need to maintain.
If epoch is undefined or differs too much from the the epoch of remote, a full rsync is done first.
At the moment, too much is defined as : more than (the number of seconds in) two days ; the choice is arbitrary.
To fetch (all) RECENTs, we sync them into area temp
, marking
their inode number. After the sync the RECENTs with a changed
inode number are fresh.
This is very efficient on the client and the server.
On startup, we must initialize temp
with hardlinks to the RECENTs
in local
.
New events (more recent than epoch) are looked up in the (ordered set
of) RECENTs in area temp
. If new events are found, epoch is set
to the epoch of the last event.
When processing (a batch of) events, we must fetch (a batch of) files
from remote
. We want to rsync directly into local
to take
advantage of rsync's matched data magic ; if the source and
destination files differ only a little, the transfer is very efficient.
First we determine the last event for each event-path in the batch ;
if it is a delete event, we just delete the event-path from local
;
if it is a new event, we add the path to the fetch-list, marking
the inode number (possibly undef) of the path in local
.
Then we rsync the files from remote
into local
.
When rsync is finished, we determine which items on the fetch-list
have made it into area sync
. If the rsync had no errors, or
the inode number of the file changed, the event is done.
If the rsync was partial (rsync exit codes 23, 24), the tried
counter of the even is incremented. The event is pushed on the
todo list.
When an event's tried counter is over a certain limit (currently 3),
the event is retried in a one-file-rsync. If the rsync succeeds, the
event is considered done.
If the rsync resulted in a partial transfer with number of files: 0,
iim concludes that the event-path does not exist in remote
,
and the event is discarded as a bogus new event.
Otherwise the event is put on the todo list (again).
Why worry about bogus new events ?
Suppose that a bogus new event would appear in the event-list
(because of a mistake in iim and/or the RECENTs).
Then the event would sit in the todo list forever.
As a consequence, no fresh RECENTs would ever be moved to local
,
because that's only allowed if the todo list is empty.
That's why we make sure that bogus new events are handled properly,
although the need may very well never arise.
local
Once all events in the RECENTs in temp
have been processed,
(todo is empty) the fresh Recents (having link count 1)
in area temp
may be moved to area local
.
We move a fresh RECENT R
(without copying) like this :
unlink local/R link temp/R local/R
Aside: my Perl (v5.8.8 built for x86_64-linux-thread-multi) has a problem
with renames ; if a
, b
and c
are all hard links to the same thing,
a rename 'a', 'b'
returns 1
, but has not renamed a
.
The rename
can fail (according to the doc), but it should not
return 1
. Program /bin/mv
does the right thing.
I haven't tried rename(2)
.
That's it, basicly ; the rest is bells and whistles : items or features that are useful or decorative but not essential.
Iim should be (in order) : robust, efficient, simple. It means that we will always sacrifice efficiency and simplicity for robustness, and (within reason) simplicity for efficiency.
Before looking at improvements, let's state some assumptions.
Setting up an (rsync) connection to a remote server is expensive. The client should minimize the number of connections it makes per loop-run.
Rsyncing an unchanged file is cheap, because unchanged files are skipped in the rsync transfer. For the server and client it is cheap to determine if a file is unchanged (usually by comparing size/date).
For instance, suppose we have a file area containing recent versions of the RECENTs. Now, rsyncing all RECENTs is just as expensive as rsyncing only RECENT-1h.json, because most of the RECENTs will be unchanged. Those that are changed will most likely be needed sooner or later anyway.
Suppose CPAN provides a file RECENT-cache
, containing a list
of filenames : RECENT-*, plus all the paths named in the events
in (say) the past hour for which the path exists in the archive.
Since some path can be mentioned in more than one event,
RECENT-cache
should only contain paths for which the last
corresponding event is a new event.
Each loop begins by refreshing the cache (temp
) :
rsync --files-from MASTER::RECENT-cache MASTER::/ LOCAL/temp
Then, the client proceeds as before : it inspects the RECENTs for new events and creates a list of items to fetch. However, it can exclude items that are available in the cache. Normally (if/when the client is in sync), all required items will be in the cache, so a second rsync will not be necessary.
Refreshing the cache will be cheap, because most items will be unchanged.
Cache maintenance is easy : the cache always contains the most
recent RECENT-cache
; so, after a refresh of the cache, the
client can weed out obsolete entries from the cache.
Assume for a moment that it is acceptable for a client to do a full sync if/when it finds itself out-of-sync for over a month (either on start-up or because it is suspended, or loses connection for more than a month, whatever).
Now we can drop from the server RECENT-{1Q,1Y,Z}.json
,
and set 'merged' to 'null' in RECENT-1M.json
;
we 'forget' all events older than one month.
Note that the epoch of an archive is still always defined,
because the (truncated) history will never become empty,
even when the master freezes for a very long time.
Truncating history doesn't matter to a client that is reasonably in sync. So, suppose that the client has slept for two months and wakes up. As usual, it syncs the remote RECENTs, and start looking for new, unseen events. if (and only if) the client runs out of history, it does a full rsync ; this is ok (by assumption).
Only when the master becomes available again, after being completely unreachable for over a month, would all clients start requesting full rsyncs. This scenario is very unlikely.
If one takes the view that full rsyncs may never be used, the price to pay is having to maintain (and use) the (ever growing) full history forever ; and forever is a very long time.
At the moment, the RECENTs are generated by an agent that watches the CPAN file-tree and generates (epoch,path,type)-events. Suppose the agent can output the events to some stream.
For now we assume the event-generator can send (type,path)-tuples over a socket to some daemon ; if/when the connection to the daemon fails, the event-generator logs the tuples to some file. On reconnect, it moves the file, sends the backlog to the daemon, then removes the file.
Suppose the event-stream is maintained (by some daemon) in an sqlite database with (id,type,path) tuples.
id is just an auto-increment counter, starting at 1.
type is the event's type : either new or delete.
path is the event's path. There is only one row per path.
Column path has a UNIQUE
constraint with an ON CONFLICT REPLACE
clause, so only the last event regarding path is stored.
New events can be inserted without further housekeeping.
In a delete directory-event, the path ends with a '/
'.
Because rsync
can't replace a non-empty directory with file
(or symlink), the client must cleanup the directory, before running rsync.
Now this works :
3000 delete some-path/ 4000 upd some-path # some-path is a file
Without the trailing slash, event 3000 would be lost (replaced by 4000).
For diagnosic purposes, a time column could be added,
initialised as int(epoch)
.
Next, we need a daemon, which does two things.
The daemon adds new events to the database.
A root-daemon accepts authorised requests from a the event-generator ; like
add $path $type $sum
where $sum is the checksum of ``$path.$typ.$secret'', and $secret is shared between the event-generator and the daemon.
On a downstream site, events are added by the client after the event is processed by the client.
On request, the daemon answers to a limited set of queries like
first_id last_id from $x from $x to $y from $x limit $n path $path stat $path lstat $path readlink $path
where $x and $y refer to id
's.
The result of a query is a proper json file.
Query path
returns the event for $path, if any.
Query stat
returns stat /path/to/cpan/$path
;
likewise lstat
and readlink
.
The proposed daemon would make an ``instant mirroring'' client, simpler and more efficient.
Instead of gleaning events from the RECENTS, a client can query the daemon for new events. It only has to remember the id of the last processed event.
$last_id = get_local_last_id || 0 ;
LOOP : { push @events, query_for_events_after $last_id ; $last_id = process @events ; commit_local $last_id ; }
At the moment, a client must be very careful in handling the RECENTS ; that complexity would disappear.
With the remote file queries (stat, lstat, readlink), the client can resolve exceptional problems resulting from missing or bugus events in the event-stream.
No more RECENTS to generate, maintain and distribute.
The proposed daemon is simple, and can easily be made robust and efficient.
The current event-stream-generator could/should be easy to adapt. If it can't be adapted, the root-daemon can glean events from the RECENTs. The root-daemon would have to remember (between runs) the epoch of the last event committed to the database.
© 2011-2013
Henk P. Penning,
Faculty of Science,
Utrecht University
iim version 0.4.14 - Tue Oct 4 07:56:13 2016 - dev revision 108