question

Upvotes
1 0 0 4

NIProvider: end stream of RIC updates so that only one ADH gets updates for it

Hi,

Assume there is a multi-process application working as a NIProvider. It connects to one of two ADH-s configured to accept connection from the application. Please note that these two ADH-s do not work in a hot standby for NIP service. NIP on ADH-s is configured for source aggregation and communication is on TCP. It's not multicast.

The multi-process application has several instances and each one at any point in time may be streaming updates to one of the ADH-s. However at a given time, only one instance is streaming updates for a given unique RIC, there is no overlapping over time.

The application is normally connected to the primary ADH, but if this one goes down then it may connect to the second ADH. When that happens, it will start streaming RICs updates to the secondary ADH. After some time a task of streaming updates for a given RIC may go back to another instance that is connected to primary ADH. Question:

How should the instance connected to secondary ADH "end" the stream of updates such that there is only one open stream for a given RIC among the ADH-s?

In other words goal is to have a given RIC only opened on one ADH. If that is not done, then ADS-s are confused: one of them might get the stream from secondary ADH (wrong, the stream is not being updated there anymore), but another one might reach to a primary ADH (good).

trepricsADHnon-interactive-provider
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
23k 22 9 14

Hello @Konrad,

In my understanding, the core of the question is to design for the scenario when "a task of streaming updates for a given RIC may go back to another instance that is connected to primary ADH".

My assumption also is that under normal operation, the vast majority of the time, you are supporting load-balancing, i.e. the RICs are shared between the two NIPs.

For me, a cleaner solution is to designate that the task of streaming a specific RIC can only be assigned to one NIP. I.e. If there is failure on ADH, which should be very rare, then a NIP detects the failure and reconnects to another ADH and starts streaming its designated RICs. Once one NIP is streaming a RIC, it owns RIC and the task does not go to the other NIP. A possible covenant is that if a NIP dies, the other NIP gets the task of streaming all other RICs. And they do not go back, until a time that all is restarted.

The other potential approach can be that a NIP connects to both ADHs, and possibly subscribing to a the source directory, however, only publishing it's RICs to a single ADH at a time. This way it will know if one of the ADHs has failed and it needs to take care of the complete set of RICs, and not just the once initially assigned.

Hope these ideas help

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
1 0 0 4

"In my understanding, the core of the question is to design for the scenario when "a task of streaming updates for a given RIC may go back to another instance that is connected to primary ADH"."

yes.

"My assumption also is that under normal operation, the vast majority of the time, you are supporting load-balancing, i.e. the RICs are shared between the two NIPs. "

yes.

"For me, a cleaner solution is to designate that the task of streaming a specific RIC can only be assigned to one NIP. I.e. If there is failure on ADH, which should be very rare, then a NIP detects the failure and reconnects to another ADH and starts streaming its designated RICs. Once one NIP is streaming a RIC, it owns RIC and the task does not go to the other NIP. "

that's how my application is implemented now. in a given time certain RIC is only streamed from one NIP.

"A possible covenant is that if a NIP dies, the other NIP gets the task of streaming all other RICs. And they do not go back, until a time that all is restarted. "

yes, that's how it goes. but this other NIP may be connected to another ADH and that's where the problem lies. in this case, both ADH-s have the same RIC open. how to "end" it on the ADH that a NIP, who lost the RIC generation was connected to? if we don't, the ADS-s are confused and might take the stream from ADH, that is not getting updates anymore.

as to the last paragraph: I am not sure it helps with the issue I described above.

thanks for helping

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
23k 22 9 14

Hello @Konrad

From what I am understanding, the issue does not happen when ADH1 fails. Rather, when connectivity from NIP1 to ADH1 fails, or NIP1 fails, and NIP2 starts streaming.

I think it should be with the infra then, on ADH1, to close all items from the failed NIP, via timeout. However, I am afraid I do not know enough, to point you to the right config or set of config.

The alternative I see is for both NIPs to connect to both ADHs and only publish to one, to subscribe source directory, to know the other is healthy or not. This makes solution more complex, and to me, this approach is error-prone. I would try to explore config with your market data admin/group, to make sure the items are closed on source failure.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
1 0 0 4

I think it should be with the infra then, on ADH1, to close all items from the failed NIP, via timeout. However, I am afraid I do not know enough, to point you to the right config or set of config.

that's the thing, no one knows and there is no clear guideline from Refinitiv on that unfortunately. BTW when you say 'via timeout' what StatusMsg do you actually think of (what states exactly)?

Regarding your other idea with populating to both ADH-s. I actually wanted to give it a try, but to publish to both ADH-s at all times (I think I've seen this idea in a few questions here). What do you think of that?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
23k 22 9 14

Hello @Konrad,

I invite you to take my next ideas with the grain of salt, as am a developer not a market data engineer, and review RTDS Installation Guide

I think you are looking for:

disconnectServiceDown: True

"If multiple servers exist for the same service on the network (load balanced), Refinitiv recommends that you set disconnectServiceDown to True so that all the items from a failed server are deleted rather than marked as stale. The sink component sends requests to other servers instead."

The other way to delete is by way of expiration, per "Auto-Expiry of Posts and Publications to Non-Interactive Source-Driven Cache"

*adh*servicename*autoExpiryItemBaseline: update

*adh*servicename*autoExpiryItemTimeUnits: seconds

*adh*servicename*autoExpiryItemTimeout: X

making X in seconds very short. This would need to be tested and tuned, repeatedly, in your environment. This is the timeout that I have meant, the infra timeout.

---

I did not suggest to publish to both ADHs, rather connect to both and login ( if you are working on EMA NIP, see example ex340_Login_Streaming and observe status of service via callback, and only publish to one ADH.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Click below to post an Idea Post Idea