question

Upvotes
Accepted
24 4 0 7

OmmConsumer, ping timeout, channel recover and maximum load.

Hi,
We have an application using EMA Java to subscribe, process and then republish some data to our TREP. We are seeing some issues where the OmmConsumerImpl loses the connection (channel down) and then fails to recover. We have a ChannelSet configuration and you can see from the logs that the OmmConsumerImpl is bringing up a channel, getting a ping timeout (30seconds) and then trying the next channel. The ADS servers were both up at the time of this failure and other applications were working fine so this is an issue within EMA failing the connection locally. We have seen the connection recover from a channel down at other times but the example below shows that it never recovers.

The overall issue is also why we get the channel down events in the first place? The ChannelSet failing to recover is also a fairly big issue.

Is there a maximum capacity that we should use a single instance of a OmmConsumer for? We are subscribing to around 8000 tickers and we were receiving about 10k updates/second at the time of the initial Channel Down event.

Perhaps we need to tune some of the parameters, we have:

Consumer:

  • LoginRequestTimeOut=30000ms
  • User dispatch,
  • MaxDispatchCountUserThread= default (100), should we set this higher given we expect 1000s of messages a second?
  • DispatchTimeoutApiThread=0 (default), should we adjust this?

Channel:

  • NumInputBuffers=2048
  • GuaranteedOutputBuffers=5000
  • ConnectionPingTimeout= Default (looks like the 30second default below)
  • SysRecvBufSize - we're not setting this, should we?
  • SysSendBufSize - we're not setting this, should we?


Here are the logs from the channel down events:

2018-05-04 14:35:41.110 WARN  [ElektronConsumerDispatcher] 
access.OmmConsumerImpl - loggerMsg| ClientName: ChannelCallbackClient| 
Severity: Warning| Text:  Received ChannelDownReconnecting event on chan
nel
 2.rmds.| RsslReactor @1d63c996| RsslChannel @4f10502a| Error Id 0| 
Internal sysError 0| Error Location null| Error text SocketChannel.read 
returned -1 (end-of-stream)| loggerMsgEnd|
2018-05-04 14:35:41.205 
WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: LoginCallbackClient| Severity: Warning| Text:  RDMLogin 
stream state was changed to suspect wit
h status message| username <not set>| usernameType <not set>| State: Open/Suspect/None - text: ""| loggerMsgEnd|
2018-05-04
 14:35:41.243 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelDictionary| Severity: Warning| Text:  
RDMDictionary stream state was changed to suspect with status message| 
streamId 3| Reason State: Open/Suspect/None - text: "channel down."| 
loggerMsgEnd|
2018-05-04 14:35:41.244 WARN  
[ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelDictionary| Severity: Warning| Text:  RDMDictionary 
stream state was changed to suspect with status message| streamId 4| 
Reason State: Open/Suspect/None - text: "channel down."| loggerMsgEnd|
2018-05-04
 14:35:41.245 WARN  [ElektronConsumerDispatcher] 
heartbeat.HeartbeatService - Error event for heartbeat message on HB. 
ErrorEvent{subject=HB, state=UNKNOWN, statusMsg=channel down.}
2018-05-04
 14:35:41.255 WARN  [ElektronConsumerDispatcher] 
subscribe.RmdsTickClient - Received: ErrorEvent{subject=.., 
state=UNKNOWN, statusMsg=channel down.}
...
2018-05-04 
14:35:42.877 WARN  [ElektronConsumerDispatcher] subscribe.RmdsTickClient
 - Received: ErrorEvent{subject=.., state=UNKNOWN, statusMsg=channel 
down.}
2018-05-04 14:35:42.877 WARN  [ElektronConsumerDispatcher] 
subscribe.RmdsTickClient - Received: ErrorEvent{subject=.., 
state=UNKNOWN, statusMsg=channel down.}
2018-05-04 14:35:42.878 INFO 
 [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelCallbackClient| Severity: Info| Text:  Received 
ChannelUp event on channel 1.rmds.| Instance Name CONSUMER_B_PREPROD_1| 
Component Version ads2.6.9.L1.linux.tis.rrg 64-bit| loggerMsgEnd|
...
2018-05-04
 14:40:44.365 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelCallbackClient| Severity: Warning| Text: 
 Received ChannelDownReconnecting event on channel 1.rmds.| RsslReactor 
@1d63c996| RsslChannel @4f10502a| Error Id 0| Internal sysError 0| Error
 Location Reactor.processWorkerEvent| Error text Ping error for channel:
 Lost contact with connection...| loggerMsgEnd|
2018-05-04 
14:40:44.366 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: LoginCallbackClient| Severity: Warning| Text:  
RDMLogin stream state was changed to suspect with status message| 
username <not set>| usernameType <not set>| State: 
Open/Suspect/None - text: ""| loggerMsgEnd|
2018-05-04 14:40:44.367 
WARN  [ElektronConsumerDispatcher] subscribe.RmdsTickClient - Received: 
ErrorEvent{subject=.., state=UNKNOWN, statusMsg=channel down.}
...
2018-05-04
 14:40:45.391 INFO  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelCallbackClient| Severity: Info| Text:  
Received ChannelUp event on channel 2.rmds.| Instance Name 
CONSUMER_B_PREPROD_1| Component Version ads2.6.9.L1.linux.tis.rrg 
64-bit| loggerMsgEnd|


Later on, we had several ping timeouts:

2018-05-04 15:08:56.215 WARN  [ElektronConsumerDispatcher] 
access.OmmConsumerImpl - loggerMsg| ClientName: ChannelDictionary| 
Severity: Warning| Text:  RDMDictionary stream state was changed to 
suspect with status message| streamId 4| Reason State: Open/Suspect/None
 - text: "channel down."| loggerMsgEnd|
2018-05-04 15:08:56.197 WARN 
 [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelCallbackClient| Severity: Warning| Text:  Received 
ChannelDownReconnecting event on channel 2.rmds.| RsslReactor @6091c183|
 RsslChannel @26d64e5a| Error Id 0| Internal sysError 0| Error Location 
Reactor.processWorkerEvent| Error text Ping error for channel: Lost 
contact with connection...| loggerMsgEnd|
2018-05-04 15:08:56.204 
WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: LoginCallbackClient| Severity: Warning| Text:  RDMLogin 
stream state was changed to suspect with status message| username 
<not set>| usernameType <not set>| State: Open/Suspect/None -
 text: ""| loggerMsgEnd|
2018-05-04 15:08:56.213 WARN  
[ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelDictionary| Severity: Warning| Text:  RDMDictionary 
stream state was changed to suspect with status message| streamId 3| 
Reason State: Open/Suspect/None - text: "channel down."| loggerMsgEnd|
2018-05-04
 15:08:57.269 INFO  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelCallbackClient| Severity: Info| Text:  
Received ChannelUp event on channel 1.rmds.| Instance Name 
CONSUMER_B_PREPROD_1| Component Version ads2.6.9.L1.linux.tis.rrg 
64-bit| loggerMsgEnd|
2018-05-04 15:17:14.069 WARN  
[ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelCallbackClient| Severity: Warning| Text:  Received 
ChannelDownReconnecting event on channel 1.rmds.| RsslReactor @6091c183|
 RsslChannel @26d64e5a| Error Id 0| Internal sysError 0| Error Location 
Reactor.processWorkerEvent| Error text Ping error for channel: Lost 
contact with connection...| loggerMsgEnd|
2018-05-04 15:17:14.069 
WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: LoginCallbackClient| Severity: Warning| Text:  RDMLogin 
stream state was changed to suspect with status message| username 
<not set>| usernameType <not set>| State: Open/Suspect/None -
 text: ""| loggerMsgEnd|
2018-05-04 15:17:15.116 INFO  
[ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelCallbackClient| Severity: Info| Text:  Received 
ChannelUp event on channel 2.rmds.| Instance Name CONSUMER_B_PREPROD_1| 
Component Version ads2.6.9.L1.linux.tis.rrg 64-bit| loggerMsgEnd|
2018-05-04
 15:19:44.217 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelCallbackClient| Severity: Warning| Text: 
 Received ChannelDownReconnecting event on channel 2.rmds.| RsslReactor 
@6091c183| RsslChannel @26d64e5a| Error Id 0| Internal sysError 0| Error
 Location Reactor.processWorkerEvent| Error text Ping error for channel:
 Lost contact with connection...| loggerMsgEnd|
2018-05-04 
15:19:44.218 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: LoginCallbackClient| Severity: Warning| Text:  
RDMLogin stream state was changed to suspect with status message| 
username <not set>| usernameType <not set>| State: 
Open/Suspect/None - text: ""| loggerMsgEnd|
2018-05-04 15:19:45.258 
INFO  [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelCallbackClient| Severity: Info| Text:  Received 
ChannelUp event on channel 1.rmds.| Instance Name CONSUMER_B_PREPROD_1| 
Component Version ads2.6.9.L1.linux.tis.rrg 64-bit| loggerMsgEnd|
2018-05-04
 16:07:23.093 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelCallbackClient| Severity: Warning| Text: 
 Received ChannelDownReconnecting event on channel 1.rmds.| RsslReactor 
@6091c183| RsslChannel @26d64e5a| Error Id 0| Internal sysError 0| Error
 Location Reactor.processWorkerEvent| Error text Ping error for channel:
 Lost contact with connection...| loggerMsgEnd|
2018-05-04 
16:07:23.094 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: LoginCallbackClient| Severity: Warning| Text:  
RDMLogin stream state was changed to suspect with status message| 
username <not set>| usernameType <not set>| State: 
Open/Suspect/None - text: ""| loggerMsgEnd|
2018-05-04 16:07:24.663 
INFO  [ElektronConsumerDispatcher] access.OmmConsumerImpl - loggerMsg| 
ClientName: ChannelCallbackClient| Severity: Info| Text:  Received 
ChannelUp event on channel 2.rmds.| Instance Name CONSUMER_B_PREPROD_1| 
Component Version ads2.6.9.L1.linux.tis.rrg 64-bit| loggerMsgEnd|
2018-05-04
 16:22:31.342 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: ChannelCallbackClient| Severity: Warning| Text: 
 Received ChannelDownReconnecting event on channel 2.rmds.| RsslReactor 
@6091c183| RsslChannel @26d64e5a| Error Id 0| Internal sysError 0| Error
 Location Reactor.processWorkerEvent| Error text Ping error for channel:
 Lost contact with connection...| loggerMsgEnd|
2018-05-04 
16:22:31.343 WARN  [ElektronConsumerDispatcher] access.OmmConsumerImpl -
 loggerMsg| ClientName: LoginCallbackClient| Severity: Warning| Text:  
RDMLogin stream state was changed to suspect with status message| 
username <not set>| usernameType <not set>| State: 
Open/Suspect/None - text: ""| loggerMsgEnd|
elektronrefinitiv-realtimeelektron-sdk
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

1 Answer

Upvotes
Accepted
1.9k 6 9 16

After the investigation, somehow this “Error text Ping error for channel: Lost contact with connection...” error message doesn't occur with EMA version 3.2.0.1 (Elektron-SDK_1.2.0.1.L1). It could be used as a workaround/resolution for this issue.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

We haven't seen it again yet on the new version but it's fairly intermittent so we'll keep an eye out for any new occurences.

Click below to post an Idea Post Idea