What is a Failure?
Dec 19, 2020
We’ve discussed the basics of assembling, maintaining and aligning VSAT antennas in our prior articles, The Care and Feeding of VSAT Antennas and Line It Up. At this point we assume the system is installed and operational, so in this article we would like to review some basic troubleshooting steps for connectivity issues. This discussion is primarily directed to our partners. As part of our relationship, we expect our local partners to perform some basic troubleshooting and to provide some basic information, otherwise problem resolution may take longer, leading to their clients becoming frustrated.
Using a recent service ticket and the NOC response as an example, we have a situation where the partner opened a service ticket on the portal, reporting: “My client has experienced outages of approximately 30 minutes, has experienced the failure approximately two weeks ago.”
The engineer opened the site operating details and looked at the charts reporting, “…looking back at the last week of this site’s performance, I don’t really see any issues there. The modem constantly stays online with solid signal levels.” He then produced the charts to confirm this. He also reported no packet loss on the link and asked for a better understanding of the “failures” to perform additional troubleshooting.
This is a typical scenario for the NOC. We will discuss here, how we can work together with our partners to improve the communication and hopefully get to faster problem resolution. End customers have options and will tend to go with whoever provides the best and most responsive support.
Typically, the NOC receives a call or a trouble ticket through the portal simply stating that the site is down or has experienced a “failure,” as in the example above. This does not give the engineers much to work with. It means unnecessary delays ensue because they must gather basic information before performing any useful troubleshooting. Of course, they will check to see if the site is in the network with good RF numbers, which is something the partner can and should do themselves through the portal, at the first mention of any problems.
Status of Lights
The first thing the NOC wants to know, if possible, is the status of the RX, TX, and NET LEDs on the modem, at the exact time that the connectivity issue occurs. Partners should ask their clients to provide that information if there is no engineer at the site. For example, if the LEDs are normally green, but change when users report a problem, it really helps the NOC to understand how they lights are during the outage. If they remain unchanged, that is also useful information that helps isolate whether the problem is on the RF (satellite) side or the IP side.
Next comes a technical description of the outage. This will help identify root causes. If, for example, web sites are timing out, our engineers need to know which web sites are unreachable or timing out. Is it all of them or just a certain few? If the problem is associated with sending or receiving emails, the site should be asked what protocols are being used (POP, SMTP, IMAP, etc.). If voice or video has poor quality, we need to know what VoIP service is being used or what kind of video conferencing is being attempted. Our partner’s technicians should try to gather as many technical and informational details as they can before opening a ticket. This is all especially important if the site does not have BusinessCom Networks’ Sentinel server deployed, as this device provides us a good look at the IP side of the service. If the site has router hardware with good diagnostics such as some Mikrotik devices, it may be useful to note this information in the service ticket.
Periodicity and Duration
Technicians should work with their clients to gain an understanding of the periodicity and duration of the outage. Does it happen every day around 3PM? Every 2 hours? Randomly? If the end customer can record the exact time of a particular occurrence, that can be especially useful to the NOC, allowing them to go back in the records to the time of the event, and see what was going on at that time.
Finally, provide the status of the outage or problem at the time it is being reported. Is it happening right now? Was this something the client reported to have happened yesterday? Is it working well right now?
Providing this information will cut back on a lot of wasted time and back and forth with questions, usually over slow email conversations. Avoiding that wasted time means a faster resolution for the end client and a happier customer.