Difference between link-utilization and bandwidth requirement

People generally rely on common industry tools to report link-utilization and use this insight to estimate probably the most important aspect of a network: Bandwidth requirement. Let’s revisit this approach and avoid the holes to make a better informed decision.

I often come across the below thought:

The links in our network are under 40% utilized. We do not need more bandwidth.
Common but misleading

Even though this statement sounds correct, there is more than meets the eye. Is it possible that you need more bandwidth even though the reported link utilization is under 40%? Certainly, yes. In fact, with the deep penetration of all flash arrays, this phenomenon is become more and more prevalent. I have seen this behavior with multiple deployments, ranging from small to large size storage fabrics. This is a vast subject but today I will focus on the not-so-obvious difference between link-utilization and bandwidth requirement.

The common industry practice is to report link-utilization from a Network Management Systems (NMS). Let’s do a deep dive into how such tools calculate and report link-utilization.

It depends on

how and where

link-utilization is calculated and reported

What you see on a NMS

Ingress and egress bytes on a network port are calculated by the switching ASICs. Specific hardware counters are incremented at every packet (or byte or bit, depending upon the architecture). This happens at microseconds granularity. Multiple layers get involved before these counters are made available to an end-user. Counters are exported by the ASICs to the switch operating system every few seconds (10 seconds is a common interval). The switch operating system exports these counters to a remote NMS every few minutes (SNMP walk every 5 minutes is a common deployment).

NMS report link utilization in granularity of minutes. It lacks the visibility into any burst of traffic that might have happened in between.

Let’s understand this with the help of an example of 8G Fibre Channel port (Tip: 8G FC port operates at 6.7 Gbps).

Consider three polling cycles at time 0, 5 and 10 minutes. Let’s assume the reported values are 0, 10 billion and 60 billion bytes respectively. The NMS takes the delta in the reported bytes and divide it by the polling duration to calculate the link utilization. At time = 5 minutes, delta is 10 billion bytes. This results in 267 Mbps.

(Multiply by 8 to convert to bits. Convert 5 minutes into seconds by dividing by 5 x 60)

At time = 10 minutes, delta is 50 billion bytes (60 minus 10). This results in 1, 333 Mbps.

(Multiply by 8 to convert to bits. Convert 5 minutes into seconds by dividing by 5 x 60)

It is a common industry practice to use this output on a NMS to report link-utilization. In this example, the link utilization is 1.3 Gbps, which is just under 20% of the 6.7 Gbps capacity of the 8G FC link. Look good?

What really happens on a network port

A network port transmits and receives traffic at extremely low granularity. It is possible that the application traffic profile is bursty in nature. For few microseconds, a port may be transmitting at full capacity followed by a period of minimal activity.

Let’s continue with the previous example of 8G Fibre Channel port operating at 6.7 Gbps. The traffic starts at time = 0 minutes, at a constant throughput of 267 Mbps for next 5 minutes. The network port exports the metric at the end of 5 minutes. This shows up as 10 billion bytes in the network management application.

(Divide by 8 to convert to bytes. Convert 5 minutes into seconds by multiplying by 5 x 60)

All is well till now.

But at time = 5 minutes, the application causes a traffic burst resulting in line rate utilization for next 60 seconds. At 6.7 Gbps, a port can transmit 50 billion bytes in approximately 60 seconds.

(Divide by 8 to convert to bytes. Convert 1 minute into seconds by multiplying by 60)

Following this, the port remains idle for the next 4 minutes. At the end of 10 minutes, the NMS still sees 60 billion bytes total with no visibility into the traffic burst within last 4 minutes.

One can argue that measuring the link-utilization at network ports can give more realistic picture but, first, industry does not do it that way to keep it operationally simple and second, microseconds is still an extremely low granularity to report any metric.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Are you calculating bandwidth requirement from link-utilization? Think again.

What you see on a NMS

What really happens on a network port

Summary

Paresh Gupta

Leave a Reply Cancel reply