question

Upvotes
Accepted
5 0 1 3

Elektron EMA API switch failover

The situation occurred last night, but the circuit of another edge server (10.102.210.137) is normal. If this happens, can the EMA API automatically switch to the host of the normal circuit? We have settings, but didn't switch last night

Case#: 06840766
Impacted service: Elektron Edge
Impact: LOSS OF RESILIENCY
Impacted device(s): EDGETAI0109
Circuit Details: HUTCHISON_GLOBAL_COMMUNICATIONS circuit PEP9201605
Outage Start Time: 8/15/2018 19:10 UTC
Service resumed at 20:16 UTC (15 Aug 2018).

Client API received:

2018-08-16 03:09:59 INFO[139766029948672]: 17033 REUTERS: TACNT1YDF= : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766728234752]: 17033 REUTERS: HKDTWD=R : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766061152000]: 17033 REUTERS: 0#ES: : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766087546624]: 17033 REUTERS: 0#STW: : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766758635264]: 17033 REUTERS: .TOPX : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766029948672]: 17084 REUTERS: TACNT1YDF= : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766061152000]: 17084 REUTERS: 0#ES: : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766728234752]: 17084 REUTERS: HKDTWD=R : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766087546624]: 17084 REUTERS: 0#STW: : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766758635264]: 17084 REUTERS: .TOPX : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766029948672]: 17033 REUTERS: TACNT6MDF= : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766061152000]: 17033 REUTERS: 0#JY: : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766087546624]: 17033 REUTERS: 0#AD: : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766728234752]: 17033 REUTERS: KRW= : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766029948672]: 17084 REUTERS: TACNT6MDF= : Open / Suspect / None / 'Comms Outage'
2018-08-16 03:09:59 INFO[139766061152000]: 17084 REUTERS: 0#JY: : Open / Suspect / None / 'Comms Outage'
...................

2018-08-16 04:16:07 INFO[139766728234752]: 17033 REUTERS: HKDTWD=R : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766029948672]: 17033 REUTERS: TACNT1YDF= : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766758635264]: 17033 REUTERS: .TOPX : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766728234752]: 17084 REUTERS: HKDTWD=R : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766061152000]: 17084 REUTERS: 0#ES: : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766029948672]: 17084 REUTERS: TACNT1YDF= : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766758635264]: 17084 REUTERS: .TOPX : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766728234752]: 17033 REUTERS: KRW= : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766061152000]: 17033 REUTERS: 0#JY: : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766087546624]: 17084 REUTERS: 0#STW: : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766029948672]: 17033 REUTERS: TACNT6MDF= : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766728234752]: 17084 REUTERS: KRW= : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766758635264]: 17033 REUTERS: 0#JTI: : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766061152000]: 17084 REUTERS: 0#JY: : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:07 INFO[139766029948672]: 17084 REUTERS: TACNT6MDF= : Open / Suspect / None / 'Comms Restored'
...............

2018-08-16 04:16:22 INFO[139766061152000]: 17084 REUTERS: ESM9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766087546624]: 17084 REUTERS: 0#STW: : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766758635264]: 17084 REUTERS: 1093.HKd : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:22 INFO[139766029948672]: 17033 REUTERS: SINV8 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766061152000]: 17033 REUTERS: JYX8 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766087546624]: 17033 REUTERS: 0#AD: : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766061152000]: 17084 REUTERS: JYX8 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766758635264]: 17033 REUTERS: 83188.HKd : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:22 INFO[139766029948672]: 17084 REUTERS: SINV8 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766087546624]: 17084 REUTERS: 0#AD: : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766758635264]: 17084 REUTERS: 83188.HKd : Open / Suspect / None / 'Comms Restored'
2018-08-16 04:16:22 INFO[139766029948672]: 17033 REUTERS: SINH9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766061152000]: 17033 REUTERS: ESU8 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:22 INFO[139766087546624]: 17033 REUTERS: 0#BP: : Open / No Change / Failover started / 'Failover Started'
..................

2018-08-16 04:16:49 INFO[139766758635264]: 17084 REUTERS: DJTIU8-H9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:49 INFO[139766061152000]: 17033 REUTERS: ESH9 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766087546624]: 17084 REUTERS: STWU8 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766758635264]: 17033 REUTERS: JTIU8-M9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:49 INFO[139766061152000]: 17084 REUTERS: ESH9 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766087546624]: 17033 REUTERS: STWZ8 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766758635264]: 17084 REUTERS: JTIU8-M9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:49 INFO[139766061152000]: 17033 REUTERS: ESM9 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766087546624]: 17084 REUTERS: STWZ8 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766758635264]: 17033 REUTERS: DJTIU8-M9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:49 INFO[139766061152000]: 17084 REUTERS: ESM9 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766087546624]: 17033 REUTERS: STWH9 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766758635264]: 17084 REUTERS: DJTIU8-M9 : Open / No Change / Failover started / 'Failover Started'
2018-08-16 04:16:49 INFO[139766061152000]: 17033 REUTERS: JYX8 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766087546624]: 17084 REUTERS: STWH9 : Open / Ok / None / 'OK'
2018-08-16 04:16:49 INFO[139766758635264]: 17033 REUTERS: SPU8 : Open / No Change / Failover started / 'Failover Started'
..................

2018-08-16 04:17:12 INFO[139766758635264]: 17033 REUTERS: SPU8 : Open / No Change / Failover completed / 'Failover Completed'
2018-08-16 04:17:12 INFO[139766758635264]: 17084 REUTERS: SPU8 : Open / No Change / Failover completed / 'Failover Completed'
2018-08-16 04:17:12 INFO[139766758635264]: 17033 REUTERS: SPU9 : Open / No Change / Failover completed / 'Failover Completed'
2018-08-16 04:17:12 INFO[139766758635264]: 17084 REUTERS: SPU9 : Open / No Change / Failover completed / 'Failover Completed'
2018-08-16 04:17:12 INFO[139766758635264]: 17033 REUTERS: SPM0 : Open / No Change / Failover completed / 'Failover Completed'
2018-08-16 04:17:12 INFO[139766758635264]: 17084 REUTERS: SPM0 : Open / No Change / Failover completed / 'Failover Completed'
2018-08-16 04:17:12 INFO[139766758635264]: 17033 REUTERS: SPZ9 : Open / No Change / Failover completed / 'Failover Completed'
................

2018-08-16 04:17:31 INFO[139766758635264]: 17033 REUTERS: .TOPX : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17084 REUTERS: .TOPX : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17033 REUTERS: 0#JTI: : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17084 REUTERS: 0#JTI: : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17033 REUTERS: TWD= : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17084 REUTERS: TWD= : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17033 REUTERS: 0#SP: : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17084 REUTERS: 0#SP: : Open / Ok / None / 'OK'
2018-08-16 04:17:31 INFO[139766758635264]: 17033 REUTERS: .SP500 : Open / Ok / None / 'OK'
2018-08-16 04:17:32 INFO[139766758635264]: 17084 REUTERS: .SP500 : Open / Ok / None / 'OK'
2018-08-16 04:17:32 INFO[139766758635264]: 17033 REUTERS: 0#SSI: : Open / Ok / None / 'OK'

EmaConfig:

<ChannelSet value="Channel_1, Channel_2"/>

elektronrefinitiv-realtimeelektron-sdkrrtema-apielektron-message-api
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

10 Answers

Upvotes
Accepted
4.4k 10 6 9

@daniel76

Do not use StatusCodes to determine the application action.

The status code is provided as added information and does not represent the actual state of the stream and data.

You will need only the Stream State and Data State when determining the application action.

void AppClient::onStatusMsg(const StatusMsg& statusMsg,const OmmConsumerEvent&)
{
  if (statusMsg.hasState()
      && statusMsg.hasName()
      && statusMsg.getDomainType() == MMT_MARKET_PRICE)
    if (statusMsg.getState().getStreamState()==OmmState::StreamState::OpenEnum
        &&statusMsg.getState().getDataState()==OmmState::DataState::SuspectEnum)
    {
      //Data Suspect
      //Perform switchover to second stream
    }
}

The above code check if Stream is open and Data is suspect, which is the first two part of "Open / Suspect / None / 'Comms Outage'"

Stream Open and Data Suspect means data from the upstream device (in this case Elektron Edge) is not available. However, the data stream is still open. Elektron Edge is trying to recover data and will start sending the data again once recovered.

Since you want to switch over to second Edge device when data suspect, you will also have to determine;

  1. What will you do to the data stream of the first Edge device after switchover, will you close the stream or leave it open?
    If you leave it open you will start receiving the data again when the first Edge device recover. How will you handle it?
  2. What will you do if the second Edge device is Suspect too?

The answer for these are two points depends on your specification.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
4.4k 10 6 9

Hi @daniel76

There are situations that EMA can and cannot switch over.

Situation 1: EMA disconnect from Elektron Edge.
EMA will switch to the next channel on the ChannelSet configuration. See EMA config guide for the ChannelSet configuration.

Situation 2: Elektron Edge disconnect from Elektron Service.
EMA will not switch. Elektron Edge will send the comm outage status message to EMA but the connection between EMA and Elektron Edge will remain. EMA will start receiving the data again after Elektron Edge recover their connection to Elektron Service.

Normally, we would suggest handling Situation 2 by having a distribution server (TREP) between EMA and Elektron Edges. TREP has a feature that allows it connect to multiple Edge device and automatically rotate the downstream request to the next available host.

Without TREP, the application must implement the feature by itself.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
5 0 1 3

Hi @Warat B.

How to implement the feature on Situation 2, do you have any example code for application implemention ?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
4.4k 10 6 9

@daniel76

I'm afraid I don't have an example as it depends on how seamlessly you want for the transition.

For a perfectly seamless switch, you can always connect to both Edge servers and send the requests to both Edges but then use the response from the first Edge only. This way, you will have zero downtime at the cost of performance.

Or you can connect to both Edges but send the request to the first Edge only. You then send the request to the second Edge when you receive the data suspect status.

Or connect to the second Edge only after receiving the data suspect status.

But either way, you need two OmmConsumer objects, each use different Channel, rather than one OmmConsumer object that uses ChannelSet. You specify consumer name when you create OmmConsumer.

OmmConsumer consumer_1( OmmConsumerConfig().consumerName( "Consumer_1" ) );
OmmConsumer consumer_2( OmmConsumerConfig().consumerName( "Consumer_2" ) );

Your Consumer_1 should use Channel_1 which point to the first Edge, while Consumer_2 should use Channel_2 that point to the second Edge.

EMA will notify that the data was suspect in the onStatusMsg callback which you should use statusMsg.getState() to verify the state first. Then it is up to you how to manage the switchover.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
5 0 1 3

@Warat B.

If I use the statusMsg.getState() method, which StatusCodes need to be processed? (eg connecting to the second Edg Server)

Also, the string "Open / Suspect / None / 'Comms Outage'" is not found in the following documentation. Which of the following StatusCode is this?

enum StatusCode {
NoneEnum = 0,
NotFoundEnum = 1,
TimeoutEnum = 2,
NotAuthorizedEnum = 3,
InvalidArgumentEnum = 4,
UsageErrorEnum = 5,
PreemptedEnum = 6,
JustInTimeConflationStartedEnum = 7,
TickByTickResumedEnum = 8,
FailoverStartedEnum = 9,
FailoverCompletedEnum = 10,
GapDetectedEnum = 11,
NoResourcesEnum = 12,
TooManyItemsEnum = 13,
AlreadyOpenEnum = 14,
SourceUnknownEnum = 15,
NotOpenEnum = 16,
NonUpdatingItemEnum = 19,
UnsupportedViewTypeEnum = 20,
InvalidViewEnum = 21,
FullViewProvidedEnum = 22,
UnableToRequestAsBatchEnum = 23,
NoBatchViewSupportInReqEnum = 26,
ExceededMaxMountsPerUserEnum = 27,
ErrorEnum = 28,
DacsDownEnum = 29,
UserUnknownToPermSysEnum = 30,
DacsMaxLoginsReachedEnum = 31,
DacsUserAccessToAppDeniedEnum = 32,
InvalidFormedMsgEnum = 256,
ChannelUnavailableEnum = 257,
ServiceUnavailableEnum = 258,
ServiceDownEnum = 259,
ServiceNotAcceptingRequestsEnum = 260,
LoginClosedEnum = 261,
DirectoryClosedEnum = 262,
ItemNotFoundEnum = 263,
DictionaryUnavailableEnum = 264,
FieldIdNotFoundDictionaryUnavailableEnum = 265,
ItemRequestTimeoutEnum = 266
}

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
5 0 1 3

Hi @Warat B.

If I want to use perfectly seamless switch, and always connect to both Edge servers and send the requests to both Edges but then use the response from the first Edge only.
How do I determine if onRefreshMsg and onUpdateMsg are the first message, and filter duplicate messages? (Is there a same serial number to decide? Example code ?)

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@daniel76

When you call registerClient() it will return item identifier, a.k.a. handle.

Then you can determine the item during the event callback by comparing the handle from OmmConsumerEvent.getHandle().

Alternatively, you can include a closure in the registerClient() and the get the closure using OmmConsumerEvent.getClosure().

Upvotes
5 0 1 3

Hi @Warat B.

By the way , If situation 1: EMA disconnect from Elektron Edge.
EMA will switch to the next channel on the ChannelSet configuration. See EMA config guide for the ChannelSet configuration.
In this case, the already registered symbol is still auto-onUpdateMsg and does not need to be re-registered, right? Or need to do something ? We don't have server permissions, how can we test this?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Hi @daniel76

It does not need to be re-registered.

You can test this with the EMA IProvider example. Just run two IProviders and have your EMA consumer connect to them. Kill the first IProvider and you will see it switch to the second without the need of re-register.

Upvotes
5 0 1 3

Hi @Warat B.

Today EDG has a symbol to warn this message "2018-08-28 08:04:00 - 1ADF9m : Open / Suspect / None / 'Item is stale' ", What does this mean ? Because only this symbol has a problem, the other is normal. How can I do ?

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Hi @daniel76

Open / Suspect / None / 'Item is stale' means there was a problem with the <1ADF9m> data and recovery is being managed by the server. You may have to check the server log for more information.

But from an API point of view, it just has to wait and the server will recover the data for them.

Again as I mentioned in the previous post, you will need only the Stream State and Data State when determining the application action.

Stream Open and Data Suspect means data from the upstream device is not available and upstream is recovering the data.

Upvotes
4.4k 10 6 9

@daniel76

Here is the Item State Decision Table.

This table can be found in the Elektron Transport API dev guide.

statetable.png (26.5 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

Upvotes
5 0 1 3

Hi @Warat B.

You have told me how to check if Stream is open and Data is suspect, which is the first two parts of "Open / Suspect / None / 'Comms Outage'".
Please tell me how to check both the last two parts ? Because the 'Item is stale' and 'Comms Outage' will be handled differently.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 5.0 MiB each and 10.0 MiB total.

@daniel76

I am afraid they aren't designed to be handled differently.

First, please understand that a message status consists of four members.

Memebers:  "Stream State /  Data State / Status Code / Status Text"
Data Type: "Enum         /  Enum       / Enum        / String     "

From the API point of view, the only difference between the two messages is the status text, which is a string that meant to be shown on screen or in a log. And unlike enum, the string can be changed in the future.

Click below to post an Idea Post Idea