question

Upvotes
Accepted
1 0 1 1

Reconnect stream on loggerMsg State: Closed RTSDK-RTO Java

My connection encounters the following error, along with some other State: Closed errors:

SEVERE: loggerMsg
ClientName: LoginCallbackClient
Severity: Error
Text: RDMLogin stream was closed with status message
username XXXX
usernameType X

State: Closed/Suspect/Timeout - text: "TREP authentication token has expired."
loggerMsgEnd


Information provided in "Migrating an application from Elektron Connect to Refinitiv Real-Time Optimized (https://developers.refinitiv.com/en/article-catalog/article/migrating-an-application-from-elektron-connect-to-refinitiv-real) states that:

"The application developer should consider how this particular scenario should be handled in an application. It could be a combination of application and/or log monitor, which detects that application is not receiving any updates, or all open subscriptions have been closed. Upon detection of this condition, the application should be terminated and restarted."

I am trying to programatically implement an automatic reconnect when these issues occur. The code provided in the Consumer.java examples contain methods for onRefreshMsg/onUpdateMsg etc. but does not not allow for loggerMsg, as this is handled in LoginCallbackClient. In the LoginCallbackClient, there is a boolean closeChannel which is set to false when the stream state is closed, however I'm also having some difficulties recreating the following issue. I have tried disconnecting my network connection to simulate a "State: Closed" but haven't been successful.


1) Is there a way to check the stream state of an OmmConsumer?

2) Is there a way to handle the loggerMsg in the Consumer class?

3) Is there a way to simulate a similar "State: Closed" scenario?


Thank you in advance

elektronrefinitiv-realtimeelektron-sdkrrtema-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
Accepted
23k 22 9 14

Hello @azat.kasimov ,

I do not believe you will be able to manipulate this aspect with Ex430, it is not intended for RTO connectivity.

What you observe looks right to me. If you register for Login domain events, you should be seeing events on login stream such as Login Accepted, Channel Up, Channel Down, however, on login stream closure if there is a channel that is established on a specific Domain, from my testing, it will supersede, and login closure will be reported on that Domain status. If it's MarketPrice - on MarketPrice, if it's NewsTextAnalytics - on NewsTextAnalytics. From my testing with the latest SDK, you can expect to see login stream closure on your established channel even without registering for Login domain.

Please note, that this configuration is not a sure-fire way to reproduce and is suggested based on guess, with the sole purpose to increase the incident of the issue manifesting, and IS NOT recommended for any other purpose, including production or development usage. I would suggest something like:

<Consumer>  
  <Name value="Consumer_4"  
  <Channel value="Channel_4"  
  <Dictionary value="Dictionary_1"  
  <MaxDispatchCountApiThread value="6500"  
  <MaxDispatchCountUserThread value="6500"/>  
  <XmlTraceToStdout value="1"/>  
  <TokenReissueRatio value="0.95"/> 
</Consumer>

Please refer to EMA Configuration Guide for additional information on EMA config.

By enabling trace you should also be able to collect additional information, and it may come helpful in understanding the cause of the issue.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
23k 22 9 14

Hello @azat.kasimov ,

Would like to suggest to register for Login domain, to track the state of the stream in onStatusMsg callback and to handle it, similar to:

LoginReq loginReq = EmaFactory.Domain.createLoginReq();

consumer.registerClient(loginReq.message(), appClient);

Please find this in example series300.ex333_Login_Streaming_DomainRep in SDK.

This should be a rare scenario, but you are absolutely correct- a consumer intended for production use should handle this scenario.

Parsing logger messages can be done, but is not a reliable or recommended approach to handling.

This specific situation, I can not think of a reliable approach to simulate, other then running and connecting to a custom publisher.

If you are on a developer machine, or on a test environment, you can simulate a network interrupt in general, by unplugging your wired network connection, waiting for the API to detect the disconnect, and plugging back in, or disconnecting and reconnecting the specific WiFi conn.

I would like to suggest, if it is possible on your side, to run with the latest version of RTSDK on GitHub as some reliability issues have been fixed even at the very latest release.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Thank you for pointing me towards this example.

I have tried simulating the network interrupt as you suggested, but this results in a re-connection with no issues:

Aug 11, 2021 12:22:30 PM com.refinitiv.ema.access.ChannelCallbackClient reactorChannelEventCallback

INFO: loggerMsg

ClientName: ChannelCallbackClient

Severity: Info

Text: Received ChannelUp event on channel Channel_1

Instance Name Consumer_1_1

Component Version ads3.4.2.L1.linux.tis.rrg 64-bit

loggerMsgEnd



This does not re-create the issue, as in the case of a Token Refresh Failure, as stated in the link in my initial post:

"An application user will have to be aware that if there are issues with token refresh, after the initial login, that their application will receive a CLOSE message on the LOGIN domain, and all the subscriptions would stop. These stopped subscriptions will not resume automatically, upon renewal of token."

I would like to test re-connecting using the onStatusMsg override for the LOGIN domain automatically in the specific case of a Token Refresh Failure, as it seems to be a unique case. Do you know how I would go about doing this?

Hello @azat.kasimov ,

There is no defined way to reproduce this issue, that I know of.

Would like to find out more of the scenario(s) when you have observed this issue and try to better understand the reason for it happening. Suggest I open a support case on your behalf with RTO support colleagues, who may be the best to advise on environment specific detailed testing and tuning to understand the causes of the issue?

If the issue is related to failure to refresh token in the allotted time frame (which is a guess at this point, rather then the conclusion that is based on the environment-specific detailed testing), then the issue may manifest more often, if you minimize the time to reissue token, in your environment, to minimum allowed, via setting on your Consumer config:

<TokenReissueRatio value="0.95"/>

Please note, even if my guess is correct, this approach will not guarantee reproducing the login stream closure issue reliably. And the cause may be totally different, also. Therefore, I would suggest to proceed to try to investigate the issue and track it to its cause.

Hi, please see my reply as an Answer, as it went over the character limit for a reply
Upvotes
1 0 1 1

We have already raised the issue with RTO support. They are having some issues on their side which causes this to be a persistent issue, so we would like to implement a solution to this issue on our side regardless .

I am looking into adjusting the refresh rate to exceed the required rate to force the error. Is this possible by adjusting ex.430? I understand that this Reissue Ratio can also be set in the EmaConfig.xml file. Could you please point me to where in the config file I can set this?

Moreover, I have implemented the login domain to attempt to track disconnects. Since I can't recreate the issue exactly as of yet, it is difficult to interpret whether it can be used to catch the issue and reconnect. Upon initial connection, I get the following printed to console for the login domain stream and the NewsTextAnalytics Domain:

RefreshMsg
    streamId="1"
    domain="Login Domain"
    solicited
    RefreshComplete
    state="Open / Ok / None / 'Login accepted by host ads-fanout-sm-az2-use1-prd.'"
    itemGroup="00 00"
    name= XXX
    nameType="1"
    Attrib dataType="ElementList"

RefreshMsg
    streamId="5"
    domain="NewsTextAnalytics Domain"
    solicited
    RefreshComplete
    state="Open / Ok / None / ''"
    itemGroup="00 0b"
    permissionData="03 01 01 10 00 2c"
    name="MRN_TRNA"
    nameType="1"
    serviceId="257"
    serviceName="ELEKTRON_DD"


However, I do not get any StatusMsg on the Login Domain when I close the stream manually using:

consumer.uninitialize();


I only see:

StatusMsg
    streamId="5"
    domain="NewsTextAnalytics Domain"
    state="Closed / Suspect / None / 'Login stream was closed.'"
    name="MRN_TRNA"
    serviceId="257"
    serviceName="ELEKTRON_DD"
StatusMsgEnd

Additionally, our logs indicate that the we do not catch a StatusMsg when the issue occurs, we only see the loggerMsg error:

Aug 10, 2021 10:50:07 PM com.refinitiv.ema.access.LoginCallbackClient rdmLoginMsgCallback
SEVERE: loggerMsg
    ClientName: LoginCallbackClient
    Severity: Error
    Text:    RDMLogin stream was closed with status message
        username XXX
        usernameType XXX

        State: Closed/Suspect/Timeout - text: "TREP authentication token has expired."
loggerMsgEnd

Will this error be caught in the onStatusMsg if the Login Domain is monitored? Is there any other way to check the state of the stream connection by streamId?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
1 0 1 1

Thank you for your response.

I was able to recreate the TREP issue, although it was by attempting multiple reconnects. It looks like the issue is indeed caught on the LOGIN stream.

The issue I'm encountering now is programming the reconnect logic. I am initially trying to close the stream using:

consumer.uninitialize();
serviceDiscovery.uninitialize();

However, when I try to re-connect by re-initializing the OmmConsumer, ServiceEndpointDiscovery, AppClient and other variables using the same initialization code as the first connection, I get the following message:

Aug 16, 2021 3:39:07 PM com.refinitiv.ema.access.ChannelCallbackClient reactorChannelEventCallback
INFO: loggerMsg
    ClientName: ChannelCallbackClient
    Severity: Info
    Text:    Received ChannelUp event on channel Channel_1
    Instance Name Consumer_1_2
    Component Version ads3.4.2.L1.linux.tis.rrg 64-bit
loggerMsgEnd

As you can see, the Instance Name is Consumer_1_2. I can also see that there are previous threads running from the initial connection. I'm assuming these threads retain information from the initial connection, resulting in this Instance Name. I would like the re-connect to simulate a fresh connection with Instance Name Consumer_1_1, with the previous threads closed, as this could cause memory issues down the line.

Is there any way to do this? Can I manually stop the previous threads from running to achieve this?

Thank you

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Hello @azat.kasimov ,

Would like to better understand, where is consumer.uninitialize() called, from the callback, or is the callback exited and uninitialize() and createOmmConsumer() are called after exit from the callback, and from the main thread?

Please note that channel name and instance name are not the same, and the new instance will usually have a different instance name from previously created instance. So the callback info looks right to me.

Click below to post an Idea Post Idea