|Did you know ...||Search Documentation:|
|Transparent Inter-Process Communications (TIPC) libraries|
These pages are not intended as a comprehensive tutorial in the use of TIPC services. The TIPC Programmer's Guide, http://tipc.sf.net/doc/Programmers_Guide.txt, provides assistance to developers who are creating applications that utilize TIPC services. The TIPC User's Guide, http://tipc.sf.net/doc/Users_Guide.txt, provides an administrator of a TIPC cluster with the information needed to operate one. A TIPC server loadable module, that may be used to make a host available as a TIPC enabled node, has been a part of the Linux kernel since 2.6.16. Please see: http://tipc.sf.net
In a TIPC network, a Node is comprised of a collection of lightweight threads of execution operating in the same process, or heavyweight processes operating on the same machine. A Cluster is a collection of Nodes operating on different machines, and operating indirectly by way of a local Ethernet or other networking medium. Clusters may be further aggregated into Zones, and Zones into Networks. The address space of two TIPC networks is completely disjoint. Zones on different networks may coexist on the same LAN but they may not communicate directly with one another.
TIPC provides connectionless, connection-oriented, reliable, and unreliable forwarding strategies for both stream and message oriented applications. But not all strategies can be used in every application. For example, there is no such thing as a multicast byte stream. The strategy is selected by the user for the application when the socket is instantiated.
TIPC is not TCP/IP based. Consequently, it cannot signal beyond a local network span without some kind of tunneling mechanism. TIPC is designed to facilitate deployment of distributed applications, where certain aspects of the application may be segregated, and then delegated and/or duplicated over several machines on the same LAN. The application is unaware of the topology of the network on which it is running. It could be a few threads operating in the same process, several processes operating on the same machine, or it could be dozens or even hundreds of machines operating on the same LAN, all operating as a unit. TIPC manages all of this complexity so that the programmer doesn't have to.
Unlike TCP/IP, TIPC does not assign network addresses to network interfaces; it assigns addresses (e.g. port-ids) to sockets when they are instantiated. The address is unique and persists only as long as the socket persists. A single Node therefore, may typically have many TIPC addresses active at any one time, each assigned to an active socket. TIPC also provides a means that a process can use to bind a socket to a well-known address (e.g. a service). Several peers may bind to the same well-known address, thereby enabling multi-server topologies. And server members may exist anywhere in the Zone. TIPC manages the distribution of client requests among the membership of the server group. A server instance responds to two addresses: its public well-known address that it is bound to, and that a client may use to establish a communication with a service, and its private address that the server instance may use to directly interact with a client instance.
TIPC also enables multicast and "publish and subscribe" regimes that applications may use to facilitate asynchronous exchange of datagrams with a number of anonymous sources that may come and go over time. One such regime is implemented as a naming service managed by a distributed topology server. The topology server provides surveillance on the comings and goings of publishers, with advice to interested subscribers in the form of event notifications, emitted when a publisher's status changes. For example, when a server application binds to a TIPC address , that address is automatically associated with that server instance in topology server's name table. This has the side effect of causing a "published" event to be emitted to all interested subscribers. Conversely, when the server's socket is closed or when one of its addresses is released using the "no-scope" option of tipc_bind/3, a "withdrawn" event is emitted. See tipc_service_port_monitor/2.
A client application may connect to the topology server in order to interrogate the name table to determine whether or not a service is present before actually committing to access it. See tipc_service_exists/2 and tipc_service_probe/2. Another way that the topology server can be applied is exemplified in Erlang's "worker/supervisor" behavioral pattern. A supervisor thread has no other purpose than to monitor a collection of worker threads in order to ensure that a service is available and able to serve a common goal. When a worker under the supervisor's care dies, the supervisor receives the worker's "withdrawn" event, and takes some action to instantiate a replacement. The predicate, tipc_service_port_monitor/2, is provided specifically for this purpose. Using the service is optional. It has applications in distributed, high-availability, fault-tolerant, and non-stop systems.
Adding capacity to a cluster becomes an administrative function whereby new server hardware is added to a TIPC network, then the desired application is launched on the new server. The application binds to its well-known address, thereby joining in the Cluster. TIPC will automatically begin sending work to it. An administrator has tools for gracefully removing a server from a Cluster, without effecting the traffic moving on the Cluster.
An administrator may configure a Node to have two or more network interfaces. Provided that each interface is invisible to the other, TIPC will manage them as a redundant group, thus enabling high-reliability network features such as automatic link fail-over and hot-swap.
Sometimes the socket's port-id alone is enough to establish an ad-hoc session anonymously between parent and child processes. The parent instantiates a socket, then forks into two processes. The child retrieves the port-id of the parent from the socket inherited from the parent using tipc_get_name/2, then closes the socket and instantiates a socket of its own. The child sends a message to the parent, on its own socket, using the parent's port-id as the destination address. The port-id received by the parent is unique to a specific instance of child. The handshake is complete; each side knows who the other is, and two-way communication may now proceed. A one-way communication (e.g. a message oriented pipe or mailbox) is also possible using only the socket inherited from the parent, provided that there is exactly one sender and one receiver on the socket. Both parent and child use the socket's own port-id, one side adopts the role of sender, and the other of receiver.
Transparent Inter-Process Communication (TIPC) provides a flexible,
reliable, fault-tolerant, high-speed, and low-overhead framework for
inter-process communication between federations of trusted peers,
operating as a unit. It was developed by Ericsson AB, as a means to
provide for communications between Common Control Systems processes and
Network Element peers in telephone switching systems, sometimes
operating at arm's length on different line cards or mainframes.
Delegation of responsibility in this way is one of the fundamental
precepts of the Erlang programming system, also developed at Ericsson.
TIPC represents a more generalized version of the same behavioral design
pattern. For an overview, please see:
|SocketType||is one of the following atoms:
socket_error('Address family not supported by protocol')is thrown if a TIPC server is not available on the current host.
|SocketId||the socket identifier returned by tipc_socket/2 or tipc_accept/3.|
node. Servers may bind to more than one address by making successive calls to tipc_bind/3, one for each address that it wishes to advertise. The server will receive traffic for all of them. A server may, for example, register one address with node scope, another with cluster scope, and a third with zone scope. A client may then limit the scope of its transmission by specifying the appropriate address.
no_scope(all), may be used to unbind the socket from all of its registered addresses. This feature allows an application to gracefully exit from service. Because the socket remains open, the application may continue to service current transactions to completion. TIPC however, will not schedule any new work for the server instance. If no other servers are available, the work will be rejected or dropped according to the socket options specified by the client.
Note that clients do not need to bind to any address. Its port-id is sufficient for this role. And server sockets (e.g. those that are bound to name/3 or name_seq/3, addresses) may not act as clients. That is, they may not originate connections from the socket using tipc_connect/2. Servers however, may originate datagrams from bound sockets using tipc_send/4. Please see the TIPC programmers's guide for other restrictions.
socket_error('Connection refused'), if there are no servers bound to the specified address.
socket_error('Connection timed out'), if no server that is bound to the specified address accepts the connect request within the specified time limit. See also tipc_setopt/2.
socket_error('Transport endpoint is not connected'), if an attempt is made to obtain a peer's name of an unconnected socket.
Defined options are:
error(socket_error('Resource temporarily unavailable'), _), will be thrown. Users are cautioned not to "spin" unnecessarily on non-blocking receives as they may prevent the system from servicing other background activities such as XPCE event dispatching.
The typical sequence to receive a connectionless TIPC datagram is:
receive :- tipc_socket(S, dgram), tipc_bind(S, name(18888, 10, 0), scope(zone)), repeat, tipc_receive(Socket, Data, From, [as(atom)]), format('Got ~q from ~q~n', [Data, From]), Data == quit, !, tipc_close_socket(S).
tipc_overview.txt, for more information on TIPC Address Structures. Options is currently unused.
A simple example to send a connectionless TIPC datagram is:
send(Message) :- tipc_socket(S, dgram), tipc_send(S, Message, name(18888, 10,0), ), tipc_close_socket(S).
Messages are delivered silently unless some form of congestion was
encountered and the
dest_droppable(false) option was issued
on the sender's socket. In this case, the send succeeds but a
notification in the form of an empty message is returned to the sender
from the receiver, indicating some kind of delivery failure. The port-id
of the receiver is returned in congestion conditions. A
is returned if the destination address was invalid. Senders and
receivers should beware of this possibility.
|Address||is one of: |
|Timeout||is optional. It is a non-negative real number that specifies the amount of time in seconds to block and wait for a service to become available. Fractions of a second are also permissible.|
|Address||is a name_seq/3 address. The address type must be grounded.|
|PortId||is unified with the port-id for a specific name_sequence address.|
|Addresses||is a list of name/3 or name_seq/3 addresses for the services to be monitored.|
|Goal||is a predicate that will be called when
a worker's publication status changes. The Goal is called
exactly once per event with its the last argument unified with the
|Timeout||is optional. It is one of:
socket_error('Address family not supported by protocol')if a TIPC server is not available on the current host.
SWI-Prolog's broadcast library provides a means that may be used to facilitate publish and subscribe communication regimes between anonymous members of a community of interest. The members of the community are however, necessarily limited to a single instance of Prolog. The TIPC broadcast library removes that restriction. With this library loaded, any member of a TIPC network that also has this library loaded may hear and respond to your broadcasts. Using TIPC Broadcast, it becomes a nearly trivial matter to build an instance of supercomputer that researchers within the High Performance Computer community refer to as "Beowulf Class Cluster Computers."
This module has no public predicates. When this module is initialized, it does three things:
A broadcast/1 or broadcast_request/1
that is not directed to one of the six listeners above, behaves as usual
and is confined to the instance of Prolog that originated it. But when
so directed, the broadcast will be sent to all participating systems,
including itself, by way of TIPC's multicast addressing facility. A TIPC
broadcast or broadcast request takes the typical form:
The principal functors
tipc_zone, specify the scope of the broadcast. The functor
tipc_node, specifies that the broadcast is to be confined
to members of a present TIPC node. Likewise,
tipc_zone, specify that the traffic should be confined
to members of a present TIPC cluster and zone, respectively. To prevent
the potential for feedback loops, the scope qualifier is stripped from
the message before transmission. The timeout is optional. It specifies
the amount to time to wait for replies to arrive in response to a
broadcast_request. The default period is 0.250 seconds. The timeout is
ignored for broadcasts.
An example of three separate processes cooperating on the same Node:
Process A: ?- listen(number(X), between(1, 5, X)). true. ?- Process B: ?- listen(number(X), between(7, 9, X)). true. ?- Process C: ?- findall(X, broadcast_request(tipc_node(number(X))), Xs). Xs = [1, 2, 3, 4, 5, 7, 8, 9]. ?-
It is also possible to carry on a private dialog with a single responder. To do this, you supply a compound of the form, Term:PortId, to a TIPC scoped broadcast/1 or broadcast_request/1, where PortId is the port-id of the intended listener. If you supply an unbound variable, PortId, to broadcast_request, it will be unified with the address of the listener that responds to Term. You may send a directed broadcast to a specific member by simply providing this address in a similarly structured compound to a TIPC scoped broadcast/1. The message is sent via unicast to that member only by way of the member's broadcast listener. It is received by the listener just as any other broadcast would be. The listener does not know the difference.
Although this capability is needed under some circumstances, it has a tendency to compromise the resilience of the broadcast model. You should not rely on it too heavily, or fault tolerance will suffer.
For example, in order to discover who responded with a particular value:
Process A: ?- listen(number(X), between(1, 3, X)). true. ?- Process B: ?- listen(number(X), between(7, 9, X)). true. ?- Process C: ?- broadcast_request(tipc_node(number(X):From)). X = 7, From = port_id('<1.1.1:3971170279>') ; X = 8, From = port_id('<1.1.1:3971170279>') ; X = 9, From = port_id('<1.1.1:3971170279>') ; X = 1, From = port_id('<1.1.1:3971170280>') ; X = 2, From = port_id('<1.1.1:3971170280>') ; X = 3, From = port_id('<1.1.1:3971170280>') ; false. ?-
While the implementation is mostly transparent, there are some important and subtle differences that must be taken into consideration:
host_to_address(+Service, +Address), somewhere in its source. This predicate can also be used to perform reverse searches. That is it will also resolve an Address to a Service name. The search is zone-wide. Locating a service however, does not imply that the service is actually reachable from any particular node within the zone.
This module provides a replicated data store that is coordinated using a variation on Lamport's Paxos concensus protocol. The original method is described in his paper entitled, "The Part-time Parliament", which was published in 1998. The algorithm is tolerant of non-Byzantine failure. That is late or lost delivery or reply, but not senseless delivery or reply. The present algorithm takes advantage of the convenience offered by multicast to the quorum's membership, who can remain anonymous and who can come and go as they please without effecting Liveness or Safety properties.
Paxos' quorum is a set of one or more attentive members, whose processes respond to queries within some known time limit (< 20ms), which includes roundtrip delivery delay. This property is easy to satisfy given that every coordinator is necessarily a member of the quorum as well, and a quorum of one is permitted. An inattentive member (e.g. one whose actions are late or lost) is deemed to be "not-present" for the purposes of the present transaction and consistency cannot be assured for that member. As long as there is at least one attentive member of the quorum, then persistence of the database is assured.
Each member maintains a ledger of terms along with information about when they were originally recorded. The member's ledger is deterministic. That is to say that there can only be one entry per functor/arity combination. No member will accept a new term proposal that has a line number that is equal-to or lower-than the one that is already recorded in the ledger.
Paxos is a three-phase protocol:
1: A coordinator first prepares the quorum for a new proposal by broadcasting a proposed term. The quorum responds by returning the last known line number for that functor/arity combination that is recorded in their respective ledgers.
2: The coordinator selects the highest line number it receives, increments it by one, and then asks the quorum to finally accept the new term with the new line number. The quorum checks their respective ledgers once again and if there is still no other ledger entry for that functor/arity combination that is equal-to or higher than the specified line, then each member records the term in the ledger at the specified line. The member indicates consent by returning the specified line number back to the coordinator. If consent is withheld by a member, then the member returns a
nackinstead. The coordinator requires unanimous consent. If it isn't achieved then the proposal fails and the coordinator must start over from the beginning.
3: Finally, the coordinator concludes the successful negotiation by broadcasting the agreement to the quorum in the form of a
paxos_changed(Term)event. This is the only event that should be of interest to user programs.
For practical reasons, we rely on the partially synchronous behavior (e.g. limited upper time bound for replies) of broadcast_request/1 over TIPC to ensure Progress. Perhaps more importantly, we rely on the fact that the TIPC broadcast listener state machine guarantees the atomicity of broadcast_request/1 at the process level, thus obviating the need for external mutual exclusion mechanisms.
Note that this algorithm does not guarantee the rightness of the value proposed. It only guarantees that if successful, the value proposed is identical for all attentive members of the quorum.
Note also that tipc_paxos now requires an initialization step. See tipc_initialize/0.
On success, tipc_paxos_set/1
will also broadcast the term
paxos_changed(Term), to the quorum.
|Term||is a compound that may have unbound variables.|
|Retries||(optional) is a non-negative integer specifying the number of retries that will be performed before a set is abandoned.|
|Term||is a compound. Any unbound variables are unified with those provided in the ledger entry.|
|Term||is an ungrounded Term|
|Term||is a compound, identical to that used for tipc_paxos_get/1.|
|Goal||is one of:
Linda is a framework for building systems that are composed of programs that cooperate among themselves in order to realize a larger goal. A Linda application is composed of two or more processes acting in concert. One process acts as a server and the others act as clients. Fine-grained communications between client and server is provided by way of message passing over sockets and support networks, TIPC sockets in this case. Clients interact indirectly by way of the server. The server is in principle an eraseable blackboard that clients can use to write (out/1), read (rd/1) and remove (in/1) messages called tuples. Some predicates will fail if a requested tuple is not present on the blackboard. Others will block until a tuple instance becomes available. Tuple instances are made available to clients by writing them on the blackboard using out/1.
In TIPC Linda, there is a subtle difference between the
rd predicates that is worth noting. The
predicates succeed exactly once for each tuple placed in the tuple
space. The tuple is provided to exactly one requesting client. Clients
can contend for tuples in this way, thus enabling multi-server
rd predicates succeed nondeterministically,
providing all matching tuples in the tuple space at a given time to the
requesting client as a choice point without disturbing them.
TIPC Linda is inspired by and adapted from the SICStus Prolog API. But unlike SICStus TCP Linda, TIPC Linda is connectionless. There is no specific session between client and server. The server receives and responds to datagrams originated by clients in an epiperiodic manner.
Example: A simple producer-consumer.
In client 1:
init_producer :- linda_client(global), producer. producer :- produce(X), out(p(X)), producer. produce(X) :- .....
In client 2:
init_consumer :- linda_client(global), consumer. consumer :- in(p(A)), consume(A), consumer. consume(A) :- .....
..., in(ready), %Waits here until someone does out(ready) ...,
Example: A critical region
..., in(region_free), % wait for region to be free critical_part, out(region_free), % let next one in ...,
Example: Reading global data
..., rd(data(Data)), ...,
or, without blocking:
..., (rd_noblock(data(Data)) -> do_something(Data) ; write('Data not available!'),nl ), ...,
Example: Waiting for any one of several events
..., in([e(1),e(2),...,e(n)], E), % Here is E instantiated to the first tuple that became available ...,
Example: Producers and Consumers in the same process using
consumer1 :- repeat, in([p(_), quit], Y), ( Y = p(Z) -> writeln(consuming(Z)); !), fail. producer1 :- forall(between(1,40, X), out(p(X))). producer_consumer1 :- linda_eval(consumer1), call_cleanup(producer1, out(quit)), !. % % consumer2 :- between(1,4,_), in_noblock(p(X)), !, writeln(consuming(X)), consumer2. producer2 :- linda_eval(p(X), between(1,40, X)). producer_consumer2 :- producer2, linda_eval(consumer2), !. % % consumer3 :- forall(rd_noblock(p(X)), writeln(consuming(X))). producer3 :- tuple(p(X), between(1,40, X)). producer_consumer3 :- producer3, linda_eval(done, consumer3), in(done), !.
The server is the process running the "blackboard process". It is part of TIPC Linda. It is a collection of predicates that are registered as tipc_broadcast listeners. The server process can be run on a separate machine if necessary.
To load the package, enter the query:
?- use_module(library(tipc/tipc_linda)). ?- linda. TIPC Linda server now listening at: port_id('<1.1.1:3200515722>') true.
The clients are one or more Prolog processes that have
connection(s)to the server.
To load the package, enter the query:
?- use_module(library(tipc/tipc_linda)). ?- linda_client(global). TIPC Linda server listening at: port_id('<1.1.1:3200515722>') true.
port_id('<1.1.1:3200515722>')). This predicates looks to see if a server is already listening on the cluster. If so, it reports the address of the existing server. Otherwise, it registers a new server and reports its address.
?- linda. TIPC Linda server now listening at: port_id('<1.1.1:3200515722>') true. ?- linda. TIPC Linda server still listening at: port_id('<1.1.1:3200515722>') true.
The following will call my_init/0 in the current module after the server is successfully started or is found already listening. my_init/0 could start client-processes, initialize the tuple space, etc.
global, is supported. A client may interact with any server reachable on the TIPC cluster. This predicate will fail if no server is reachable for that tuple space.
rdrequests. Replies arriving outside of this window are silently ignored. OldTime is unified with the old timeout and then timeout is set to NewTime. NewTime is of the form Seconds:Milliseconds. A non-negative real number, seconds, is also recognized. The default is 0.250 seconds. This timeout is thread local and is not inherited from its parent. New threads are initialized to the default.
Note: The synchronous behavior afforded by in/1 and rd/1 is implemented by periodically polling the server. The poll rate is set according to this timeout. Setting the timeout too small may result in substantial network traffic that is of little value.
error(feature_not_supported). SICStus Linda can disable the timeout by specifying
offas NewTime. This feature does not exist for safety reasons.
?- out(x(a,3)), out(x(a,4)), out(x(b,3)), out(x(c,3)). true. ?- bagof_rd_noblock(C-N, x(C,N), L). L = [a-3,a-4,b-3,c-3] . true. ?- bagof_rd_noblock(C, N^x(C,N), L). L = [a,a,b,c] . true.
Joining Threads: Threads created using linda_eval/(1-2) are not allowed to linger. They are joined (blocking the parent, if necessary) under three conditions: backtracking on failure into an linda_eval/(1-2), receipt of an uncaught exception, and cut of choice-points. Goals are evaluated using forall/2. They are expected to provide nondeterministic behavior. That is they may succeed zero or more times on backtracking. They must however, eventually fail or succeed deterministically. Otherwise, the thread will hang, which will eventually hang the parent thread. Cutting choice points in the parent's body has the effect of joining all children created by the parent. This provides a barrier that guarantees that all child instances of Goal have run to completion before the parent proceeds. Detached threads behave as above, except that they operate independently and cannot be joined. They will continue to run while the host process continues to run.
Here is an example of a parallel quicksort:
qksort(, ). qksort([X | List], Sorted) :- partition(@>(X), List, Less, More), linda_eval(qksort(More, SortedMore)), qksort(Less, SortedLess), !, in_noblock(qksort(More, SortedMore)), append(SortedLess, [X | SortedMore], Sorted).
Note: A virtual tuple is an extension of the server. Even
though it is operating in the client's Prolog environment, it is
restricted in the server operations that it may perform. It is generally
safe for tuple predicates to perform out/1
operations, but it is unsafe for them to perform any variant of
rd, either directly or indirectly. This restriction is
however, relaxed if the server and client are operating in separate
heavyweight processes (not threads) on the node or cluster. This is most
easily achieved by starting a stand-alone Linda server somewhere on the
out(server_quit), the server's Prolog process will exit via halt/1. It is intended for use in scripting as follows:
swipl -q -g 'use_module(library(tipc/tipc_linda)), tipc_linda_server' -t 'halt(1)'
See also manual section 126.96.36.199 Using PrologScript.
Note: Prolog will return a non-zero exit status if this predicate is executed on a cluster that already has an active server. An exit status of zero is returned on graceful shutdown.
permission_error(halt,thread,2),context(halt/1,Only from thread 'main')), if this predicate is executed in a thread other than