Hi dear developpers team and community, this if following case 07426946 where I was asked to post my question here.
I deployed a new ATS + ADH + ADS setup today in production, and I am continuously seeing this error message into the RRCPDL-Source logs :
----- Fri Mar 08 10:58:34.871198 2019 /tmp/.rrcp/source.0.rrmp: WARNING: [../Wrapper/Userlevel/rrcpCW_NetMgr.c,NetMgr_sendPkt(),274] error writing to the network: ------
Then ADS logs packet losses :
------- RRCP STATUS MSG: RRCP_BC_MISSEDMSGS: gap in broadcast msgs from node 169752968 (10.30.57.136) From Port 37000 --------
The RRCPDL parameters are :
------ [pars3pmdsc527:/rmds/users/mdadmin] rrdump source -DL -port 37000 -P
rrdump v6.7.F30: [423]:Connected with RRCP-daemonless, Version rrcp6.7.T25(6070025), Control Port: 37000
RRCP Parameter Values (IpAddr: 10.30.57.136 (169752968) source mode, device /tmp/.rrcp/source.1):
RRCP process version (6070025)
Ip Addr = 10.30.57.136 (169752968) network = 10.30.57.136 recvAddr = 0.0.0.0 (INADDR_ANY) MCRecvPort = 37002 (0x908a) MCSendToPort = 37001 (0x9089) PPPort = 37000 (0x9088) MCSendFromPort = (0) not configured devicePath = /tmp/.rrcp/source.0.rrmp packetSize = 3100 maxPktPoolSize = 200000 pktPoolLimitHigh = 190000 pktPoolLimitLow = 180000 shuffleTolerance = 1024 userQLimit = 65535 tdata = 4 ndata = 7 nrreq = 3 trreq = 4 twait = 2 nmissing = 128 tbchold = 30 tpphold = 29 nackDelayTime = 20 bitmapFilter = 0 logger.level = 3 logger.file = /rmds/log/rrcpd/adh_0_source_rrcpdl.log logger.maxSize = 52428800 logger.maxSwapfiles = 5
useIpMulticast = True ipMultTTL = 16 network = net-feed interface = 10.30.57.136 sendMultAddress = 239.254.0.168 recvMultAddress = 239.254.0.169 hsmInterface = 10.30.57.136 hsmMultAddress = 239.254.0.117 hsmPort = 30101 hsmInterval = 1 overflowMsgDump-oldest = 1 maxUsers = 10 recvPortLow = 0 recvPortHigh = 0 udpSendBufSize = 524288 udpRecvBufSize = 524288 nackDelayTime = 20 ackPackingRatio = 10 weightPPRetransSent = 1 weightPPRetransRcvd = 1 weightBCRRequestSent = 1 weightBCRRequestRcvd = 1 congestionHiWaterMark = 50 congestionLowWaterMark = 15 congestionEvaluationInterval = 5 sessionProps = (0x00000001) congestionControl = enabled switchReorderFix = disabled multiThreadEngine = disabled clock tick = 100 --------
And some statistics :
------- RRCP Statistics (IpAddr: 10.30.57.136 (169752968) source mode, device /tmp/.rrcp/source.1):
--- Fri Mar 8 11:12:26 2019
Total pkts sent: 14937491 Rxmt'd PP pkts sent: 2191 BC pkts sent: 14931213 Unack'd PP pkts: 283 PP pkts sent: 6278 RXMTREQPP pkts rcvd: 185 Total pkts rcvd: 920570 RXMTREQPP pkts sent: 25 BC pkts rcvd: 912267 Msgs from users: 14496073 PP pkts rcvd: 8303 BC msgs from users: 14493088 BC DATA pkts sent: 14493088 PP msgs from users: 2985 PP DATA pkts sent: 2985 Msgs to users: 433024 BC DATA pkts rcvd: 430193 DATA msgs to users: 432534 PP DATA pkts rcvd: 2341 BC DATA msgs to users: 430193 ACK pkts sent: 1077 PP DATA msgs to users: 2341 ACKs sent: 2341 STATUS msgs to users: 490 ACK pkts rcvd: 843 Bad pkts/from user: 0 ACKs rcvd: 2672 Bad pkts/from net: 0 PP DATA pkts ackd by wndw: 30 Discards/bad opcode: 0 RXMTREQs staged: 0 Discards/old BC: 0 RXMTREQs canceled: 0 Discards/old PP: 0 RXMTREQ pkts sent: 0 Discards/rxmt'd PP: 0 RXMTREQs sent: 0 Msgs filtered out: 0 Rxmt'd RXMTREQ pkts sent: 0 Loop Msg filtered out: 0 Rxmt'd RXMTREQs sent: 0 BC msgs misordered: 0 Rxmt'd BC pkts rcvd: 0 PP msgs misordered: 0 DISCARD pkts rcvd: 0 Lost data/BC gaps msgs: 0 RXMTREQ pkts rcvd: 1279 Lost data/BC packets: 0 RXMTREQs rcvd: 1819 Lost data/PP gaps msgs: 25 Rxmt'd RXMTREQ pkts rcvd: 3655 Lost data/PP packets: 26 Rxmt'd RXMTREQs rcvd: 5457 Lost data/node resync: 0 Rxmt'd BC pkts sent: 2137 Lost data/msg dscrd'd: 0 DISCARD pkts sent: 0 Lost data/incmplt msg: 0 NULL pkts sent: 435988 Lost data/user Q overflow: 0 NULL pkts rcvd: 482074 Pkt buffers in use: 6 Heartbeats sent: 0 Msg buffers in use: 5 Heartbeats rcvd: 0 Total bytes sent: 2156837849 Rxmt'd PP pkts rcvd: 0 Total bytes recv: 45516739
-----
We checked the server (cables, CRC errors, soft errors), and cannot see any, can you please help to understand why this is failing ?
Even if there is a very small trafic onto this server pair, it looks like RRCPd lost a lot of packets and this, regularly. I am not sure if there is a coincidence with the fact that the ATS "listens" to the ADH hotstandby state and maybe it creates trafic, I really don't know.
Any help will be appreciated ! Thanks, Julien
(ps : I am afraid that the formatting of the statistics will not be helpful here , but it displays well in the support ticket).