===== summary for feb.21,22.2007. modified, k.f, feb.25.2007. == Kazuro.Furukawa@kek.jp with Matthias.Clausen@desy.de, Bernd.Schoeneburg@desy.de and Gongfa.Liu@desy.de RMT (redundancy monitor task) and CCE (continuous control executive) with a fake i/o driver seem to be running well. Recently the channel access server (rsrv) have became controlled by RMT as well as the scan tasks. Several possible events to make switch-over were demonstrated; command to iocsh, suspending tasks, etc. Switch-over was monitored by medm. It took about 30 seconds for medm to reconnect to the new ca server. Even during hand-over period the controls seem to continue as if the machine was not exchanged. It seems that the switch-over time is less than a second. Environment of the test stand is; base-3.14.8.2, vxworks-5.5, two sets of 3u-height compact-pci, intel-celeron/pentium cpu around 1Ghz, two ethernets, private and public, no fans. The software is developed on solaris. "IOC Redundancy Design Doc" was written by john and bob dalesio on the request from matthias clausen. It covers the external specification or detailed behaviors of rmt, cce, seq/snl and drivers. There exist internal specifications or implementation documents for rmt and cce in the both german and english. rmt was written by a german company and cce was written by john dalesio. Core parts of both codes are about 3500 lines respectively. Their integration was done by gongfa liu, and he continues the test. Modifications to the base were necessary as stubs but not too many places; dbstatic, db, rec and rsrv for now. matthias may start the negotiation to include the stubs into the official base. He expects the version number to bump up to 3.15. The final implementation of the stubs should be reasonably fast so that it should be controlled by a global variable instead of an environment variable. rmt is configured through a configuration file. It includes many timeout parameters, and several tcp ports for communications and diagnostics. rmt communicates with the partner through the both public and private networks, at the rate of every 1/3 second normally. cce is currently configured by source code modifications. It again includes several parameters like rmt. cce synchronize the state of the database through the private network. Some of the cce routines will be renamed soon by bernd schoeneburg, replacing red by cce. Also the handling of scan tasks may change slightly. Resources of redundancy are called PRR (primary redundancy resource) and managed by rmt, and each resource registers by itself. It includes cce, drivers, ca servers, seq/snl in the near future, and possibly user tasks. Scan tasks are managed both by rmt and cce. Controls of the channel access server includes caserv, pairs of cas-client/cas-event, cas-tcp, cas-udp, cas-beacon. They are disconnected at the switch-over time. Some resources like cce has synchronization capability which should be kept all the time. Some others like scan tasks do not have the capability. Currently only two networks, public and private, are monitored by rmt. That may be extended in the future. Direct shared i/o monitoring for partner is not prefered since some shared i/o does not allow it. It can be monitored through prr indirectly anyway. An external ping node may be another possibility. Not only it is a possibility for a redandunt hard-ioc, but also it is more important for for a redundant soft-ioc and ca-gateway since those without ca clients has not much meanings. It is mentioned in the rmt implementation note. The configuration file has a parameter for "global" network and a function (rmtConnectGlobal) which correspond to this feature, but no code is there now. We may consider it later. Related areas around the redundant ioc are seq/snl redundancy, ca-gateway redundancy, pcas redundancy, linux and other platforms, etc. Gongfa liu is planning to implement the sequencer/snl redundancy. Ralph said that no modification is necessary in the ca-gateway code. rmt may stop and start the instaces of ca-gateways externally without any tight control relationship. Graceful transition is preferable by matthias. Shortening of the channel access server recognition is another subject. Jeff thinks there is no good reason to modify the channel access code yet. Actually if we can gracefully disconnect the channel access at the server side, the switch-over may occur immediately at the client side. Or the beacon handling modification or anomarous beacon from the redundant ioc may be useful. By the observation the ca-gateway code seems to handle such a disconnection gracefully. furukawa will ask ralph where to look in the ca-gateway code to accelerate the switch over recognition. The implementation can be different between rsrv, which is used in ioc, and cas, which is used in ca-gateway and pcas. The current codes of rmt and cce are written only for vxworks without osi. Furukawa and artem kazakov may start the porting of the code on to unix/linux with epics/osi. KEK linac currently is dependent on several soft iocs which should not stop, thus, there is a reason. Such unix support is anyway necessary to make a possible porting of rmt for ca-gateway and others, since they run on unix. At first, a thread based porting of rmt is considered instead of the process based one. The design of the redandunt ca-gateway will be delayed. After observing the behavior of redandunt soft-ioc we may discuss the implementation of redandunt ca-gateway again. The discussion may include whether a tight-coupled thread-based rmt would be used or a process-based rmt would control externally, whether the slave would completely dormant or not, etc. Whether rmt can be heavily dependent on epics is another factor. The current rmt is only dependent on vxworks. For certain purposes, it may be preferable for rmt not to be heavily dependent on epics. However, we may start with epics/osi with this factor in mind since we don't have good alternatives now. tck, timer, alarm, etc may be new in osi. The code depelopments will be exchanged through cvs at desy. The version of vxworks will be 5.5 for some time, since vxworks version 6 only provides the memory container, address space protection, which we are not much interested now. So kek may test the vxworks version as well.