This is the Metz paper. It's first draft by Thomas Krichel on was written 2006-09-04.
Metz is a project to build a software system that can maintain and visualize social network data while the underlying network data is changing. There is no precedence for such a service.
A running implementation with data from the RePEc Author Service is part of the project. This implementation is codenamed "CollEc", but this name may be changing later. The web site will live at http://collec.openlib.org.
Metz will be limited to symmetric networks. But they may not need to be binary.
Any network type has a type identifier henceforth refered to a n_id.
At this time, I am not aware, this time of a software that can calculate all paths for non-binary network
Metz will be limited to symmetric networks. But they may not need to be binary. This implies that paths between from any node a to a node b are the same as the paths from node b to a node a.
All software is written in Perl.
The network input format describes a symmetric matrix. For row, say, there is an identifier. Idenitfiers are not supposed to be strings that contain ascii characters only. They may not contain neither whitespace (i.e. blank, tab, carriage return and line feed). Neither may they contain control characters 0x00 to 0x1F nor Ox7F (in hex).
Since the network input matrix is symmetric, only the data for half the matrix has be be maintained in the input data. The rest can be induced.
Network input comes as lines. Each line is of the form
idTAB[numberTABidTAB]+"\n"
where id is an id of a node, as outlined
previously, and numberis a number,
and TAB is the tabulation character. There must be
an odd number of ids and an even number of numbers in
the line. The line starts and ends with an id. All
ids in a line the follow the first id
must have a higher lexographical position than the
first id. If they don't they are rejected and an error is signalled.
"\n" is the newline or carriage return and newline.
Example
ab 34 be 32 bi 9 meis a correct line.
ab 34 bi 3.2 be 9 meis also correct,
ab 34 bi 3.2 beis correct but
bi 34 be 32 ab 9 mebecause one of the following identifiers has a lower lexigraphical position than the first one.
All Metz data is contained in a Metz directory, in files that end with ".metz". There is one file per network. The name of the file is the identifier of the network.
All network are assumed to contain only one single component.
A Metz installation ought to be able to have several networks, each having its own Metz network input directory. However in the implementation software at this time, only one network and its binary form will be considered.
The binary form of a network will not be separately described since it can be deducted from the non-binary network. It is assumed, in the following that the network is not binary and therefore a binary equivalent exists.
The system aims to compute all shortest paths between any couple of nodes. We call the shortpath set from node A to node B the set of all shortest paths from A to B. Symmetry implies that the shortpath set from A to B is the same as the shortpath set from B to A.
Shortest paths can be stored in a file that has the paths starting for an A to every B that is lexographically higher. This can be a flat file. For its stoage in a datebase, the following column structure would seem appropriate
Note that there may be many paths with the same start_id and end_id
In the algorithms that I am aware of, paths are calculated from a certain start to all end points that are reasable.
I assume in the following that we can not calculate in an isolated fashion, indivdual paths between an arbtrary start_id and an arbitrary end_id.
When the paths for an individual author are updated,
there are a an number of intermediary paths being update.
In fact, all paths between two authors that appear in
a path are calculated again. This is best seen by an
example. Let there be two shortest paths between
two node a1 and a2
a1 a5 a4 a3 a2
a1 a5 a6 a3 a2
Here, there are intermediate paths between
a3 and a5 are also contained. Thus a
full update for an author will also update the sub paths,
which are paths that do not belong to the author.
There are 6 types of incidents that the matrix of edge can undergo.
The software written must recognize there events when comparing two version of the edges matrix.
For "w-", and "w+", the binary version does not change. The weighted version needs recalculation.