CCNP Course Institute in Delhi

Tuesday, December 14, 2010

Troubleshooting Methods Best Cisco CCSP Training Institute in Delhi Gurgaon

Network Bulls
www.networkbulls.com
Best Institute for CCNA CCNP CCSP CCIP CCIE Training in India
M-44, Old Dlf, Sector-14 Gurgaon, Haryana, India
Call: +91-9654672192

 Troubleshooting network issues is implicit in the responsibilities of a network administrator.
Such issues could arise as a result of human error (for example, a misconfiguration),
equipment failure, a software bug, or traffic patterns (for example, high utilization or a
network being under attack by malicious traffic).
Network issues can be successfully resolved using a variety of approaches. However, having
a formalized troubleshooting method can prove more efficient than a haphazard approach
to troubleshooting. A troubleshooter can select from several formalized
troubleshooting approaches. Although these approaches vary in their effectiveness, based
on the issue being addressed, a troubleshooter should have knowledge of several approaches
to troubleshooting.
This section begins by introducing you to the basics of troubleshooting. Next, you learn
the benefits of having a structured troubleshooting model, and then you are introduced to
several popular troubleshooting models. Finally, this section provides guidance to help
you select an appropriate combination of approaches for a given troubleshooting issue.
Defining Troubleshooting
The process of troubleshooting at its essence is the process of responding to a problem report
(sometimes in the form of a trouble ticket), diagnosing the underlying cause of the
problem, and resolving the problem. Although you normally think of the troubleshooting
process beginning when a user reports an issue, realize that through effective network
monitoring, you might detect a situation that could become a troubleshooting issue and
resolve that situation before users are impacted.
After an issue is reported, the first step toward resolution is clearly defining the issue.
When you have a clearly defined troubleshooting target, you can begin gathering information
related to that issue. Based on the information collected, you might be able to better
define the issue. Then you hypothesize likely causes of the issue. Evaluation of these likely
causes leads to the identification of the suspected underlying root cause of the issue.
After you identify a suspected underlying cause, you next define approaches to resolving
the issue and select what you consider to be the best approach. Sometimes the best approach
to resolving an issue cannot be implemented immediately. For example, a piece of
equipment might need replacing, or a business’s workflow might be disrupted by implementing
such an approach during working hours. In such situations, a troubleshooter
might use a temporary fix until a permanent fix can be put in place.
As a personal example, when troubleshooting a connectivity issue for a resort hotel at a
major theme park, we discovered that the supervisor engine in a Cisco Catalyst switch had
an issue causing Spanning Tree Protocol (STP) to fail, resulting in a Layer 2 topological
loop. This loop flooded the network with traffic, preventing the hotel from issuing keycards
for guest rooms. The underlying cause was clear. Specifically, we had a bad supervisor
engine. However, the time was about 4:00 PM, a peak time for guest registration. So,

Problem Report Problem Diagnosis Problem Resolution
Figure 2-1 Simplified Troubleshooting Flow
Table 2-2 Steps to Diagnose a Problem
Step Description
Collect information Because a typical problem report lacks sufficient information
to give a troubleshooter insight into a problem’s underlying
cause, the troubleshooter should collect additional
information, perhaps using network maintenance tools or by
interviewing impacted users.
Examine collected information After collecting sufficient information about a problem, the
troubleshooter then examines that information, perhaps
comparing the information against previously collected
baseline information.
Eliminate potential causes Based on the troubleshooter’s knowledge of the network and
his interrogation of collected information, he can begin to
eliminate potential causes for the problem.
Hypothesize underlying cause After the troubleshooter eliminates multiple potential causes
for the problem, he is left with one or more causes that are
more likely to have resulted in the problem. The troubleshooter
hypothesizes what he considers to be the most
likely cause of the problem.
instead of immediately replacing the supervisor engine, we disconnected one of the redundant
links, thus breaking the Layer 2 loop. The logic was that it was better to have the
network function at this time without STP than for the network to experience an even
longer outage while the supervisor engine was replaced. Late that night, someone came
back to the switch and swapped out the supervisor engine, resolving the underlying cause
while minimizing user impact.
Consider Figure 2-1, which depicts a simplified model of the troubleshooting steps previously
described.
32 CCNP TSHOOT 642-832 Official Certification Guide
This simplified model consists of three steps:
Step 1. Problem report
Step 2. Problem diagnosis
Step 3. Problem resolution
Of these three steps, the majority of a troubleshooter’s efforts are spent in the problem diagnosis
step. Table 2-2 describes key components of this problem diagnosis step.

Table 2-2 Steps to Diagnose a Problem (Continued)
Step Description
Verify hypothesis The troubleshooter then tests his hypothesis to confirm or
refute his theory about the problem’s underlying cause.
(1) Problem Report (2) Collect Information
(3) Examine Information
(4) Eliminate Potential
Causes
(5) Hypothesize
Underlying Cause
(6) Verify Hypothesis
Figure 2-2 Structured Troubleshooting Approach
Chapter 2: Introduction to Troubleshooting Processes 33
The Value of a Structured Troubleshooting Approach
Troubleshooting skills vary from administrator to administrator. Therefore, although most
troubleshooting approaches include collection and analysis of information, elimination of
potential causes, hypothesizing of likely causes, and testing of the suspected cause, troubleshooters
might spend different amounts of time performing these tasks.
If a troubleshooter does not follow a structured approach, the temptation is to move between
the previously listed troubleshooting tasks in a fairly random way, often based on
instinct. Although such an approach might lead to a problem resolution, it can become
confusing to remember what you have tried and what you have not. Also, if another administrator
comes to assist you, communicating to that other administrator the steps you
have already gone through could be a challenge. Therefore, following a structured troubleshooting
approach not only can help reduce the possibility of trying the same resolution
more than once and inadvertently skipping a task, but aid in communicating to
someone else possibilities you have already eliminated.
A structured troubleshooting method might look like the approach depicted in Figure 2-2.
Some experienced troubleshooters, however, might have seen similar issues before and
might be extremely familiar with the subtleties of the network they are working on. In
such instances, spending time methodically examining information and eliminating potential
causes might actually be less efficient than immediately hypothesizing a cause after
they collect information about the problem. This method, illustrated in Figure 2-3, is often
called the shoot from the hip method.

(1) Problem Report (2) Collect Information
Examine Information
Eliminate Potential
Causes
(3) Hypothesize
Underlying Cause
(4) Verify Hypothesis
Figure 2-3 Shoot from the Hip Troubleshooting Approach
Notice that the major distinction between a structured approach and a shoot from the hip
approach is examining information and eliminating potential causes based on that information.
The danger with the shoot from the hip method is that if the troubleshooter’s instincts
are incorrect, valuable time is wasted. Therefore, a troubleshooter needs the
perceptual acuity to know when to revert to a structured approach.
Popular Troubleshooting Methods
As noted previously, the elimination of potential causes is a key step in a structured troubleshooting
approach. You can use several common approaches to narrow the field of potential
causes:
■ The Top-Down Method
■ The Bottom-Up Method
■ The Divide and Conquer Method
■ Following the Traffic Path
■ Comparing Configurations
■ Component Swapping
The Top-Down Method
The top-down troubleshooting method begins at the top layer of the Open Systems Interconnection
(OSI) seven-layer model, as shown in Figure 2-4. The top layer is numbered
Layer 7 and is named the application layer.
The top-down method first checks the application residing at the application layer and
moves down from there. The theory is, when the troubleshooter encounters a layer that is
functioning, the assumption can be made that all lower layers are also functioning. For example,
if you can ping a remote IP address, because ping uses Internet ControlMessage Protocol
(ICMP), which is a Layer 3 protocol, you can assume that Layers 1–3 are functioning


Layer 6: Presentation
Layer 7: Application
Layer 5: Session
Layer 4: Transport
Layer 2: Data Link
Layer 1: Physical
Layer 3: Network
Figure 2-4 Top-Down Troubleshooting Method
Layer 6: Presentation
Layer 7: Application
Layer 5: Session
Layer 4: Transport
Layer 2: Data Link
Layer 1: Physical
Layer 3: Network
Figure 2-5 Bottom-Up Troubleshooting Method
properly. Otherwise, your ping would have failed. A potential downside to this approach is
that the troubleshooter needs access to the specific application experiencing a problem to
test Layer 7.
The Bottom-Up Method
The reciprocal of the top-down method is the bottom-up method, as illustrated in Figure
2-5. The bottom-up method seeks to narrow the field of potential causes by eliminating
OSI layers beginning at Layer 1, the physical layer.

ping 10.1.2.3
Layer 6: Presentation
Layer 7: Application
Layer 5: Session
Layer 4: Transport
Layer 2: Data Link
Layer 1: Physical
Layer 3: Network
Figure 2-6 Divide and Conquer Troubleshooting Method
Although this is a highly effective method, the bottom-up approach might not be efficient
in larger networks because of the time required to fully test lower layers of the OSI
model. Therefore, the bottom-up method is often used after employing some other
method to narrow the scope of the problem.
The Divide and Conquer Method
After analyzing the information collected for a problem, you might not see a clear indication
as to whether the top-down or bottom-up approach would be most effective. In such
a situation, you might select the divide and conquer approach, which begins in the middle
of the OSI stack, as shown in Figure 2-6.
In the example shown in Figure 2-6, the network administrator issued the ping 10.1.2.3
command. If the result was successful, the administrator could conclude that Layers 1–3
were operational, and a bottom-up approach could begin from that point. However, if the
ping failed, the administrator could begin a top-down approach at Layer 3.
Following the Traffic Path
Another useful troubleshooting approach is to follow the path of the traffic experiencing
a problem. For example, if the client depicted in Figure 2-7 is unable to reach its server,
you could first check the link between the client and switch SW1. If everything looks
good on that link, you could then check the connection between the switch SW1 and
router R1. Next, you would check the link between router R1 and switch SW2 and finally
the link between switch SW2 and the server.
Comparing Configurations
Did you ever go to the dentist as a kid and find yourself looking through a Highlights
magazine? This magazine often featured two similar pictures, and you were asked to spot

Step 1 Step 2 Step 3 Step 4
Client Switch
SW1
Switch
SW2
Router
R1
Server
Figure 2-7 Following the Path of Traffic
Port 1
Laptop A
Switch SW1
Port 2
Laptop A
Swap Switch Port
Switch SW1
Laptop B
Port 1
Swap Laptop
L
Port 1
Switch SW1
Laptop A
Swap Switch
Switch SW2
Swap Cable
Figure 2-8 Component Swapping
For example, imagine you have multiple remote offices, each running the same model of
Cisco router. Clients at one of those remote offices cannot obtain an IP address via DHCP.
One troubleshooting approach is to compare that site’s router configuration with the
router configuration of another remote site that is working properly. This methodology is
often an appropriate approach for a less experienced troubleshooter not well versed in the
specifics of the network. However, the problem might be resolved without a thorough understanding
of what caused the problem. Therefore, the problem is more likely to reoccur.
Component Swapping
Yet another approach to narrowing the field of potential causes of a problem is to physically
swap out components. If a problem’s symptoms disappear after swapping out a particular
component (for example, a cable or a switch), you can conclude that the old
component was faulty (either in its hardware or its configuration).
As an example, consider Figure 2-8. A problem report states that the connection between
laptop A and switch SW1 is not bringing up a link light on either the laptop or the switch.
the differences. This childhood skill can also prove valuable when troubleshooting some
network issues.

As a first step, you might swap out the cable interconnecting these two devices with a
known working cable.
If the problem persists, you will want to undo the change you made and then move the cable
from switch port 1 to switch port 2. As a next step, you could connect a different laptop
to switch SW1. If the problem goes away, you could conclude that the issue is with
laptop A. However, if the problem continues, you could swap out switch SW1 with another
switch: SW2 in this example. As you test each component and find it is not the
problem, undo the change.
Although swapping out components in this fashion might not provide great insight into
the specific problem, it could help focus your troubleshooting efforts. For example, if
swapping out the switch resolved the issue, you could start to investigate the configuration
of the original switch, checking for configuration or hardware issues.
Practice Exercise: Selecting a Troubleshooting
Approach
As a troubleshooter, you might use one of the previously discussed troubleshooting methods
or perhaps a combination of methods. To illustrate how you might select an appropriate
troubleshooting approach, consider the following problem report:
A computer lab at a university contains 48 PCs. Currently, 24 of the PCs cannot access
the Internet, whereas the other 24 PCs can. The 24 PCs that cannot currently access
the Internet were able to access the Internet yesterday.
Consider which of the previously discussed troubleshooting models might be appropriate
for an issue such as the one reported. After you reach your own conclusions regarding
which method or methods would be most appropriate, consider the following rationale:
■ Top-down: Because the application is working on some PCs in the same location,
starting at the application layer will probably not be effective. Although it is possible
that 24 of the PCs have some setting in their Internet browser (for example, a proxy
configuration) that prevents them from accessing the Internet, these PCs were working
yesterday. Therefore, it is unlikely that these 24 PCs were all recently reconfigured
with an incorrect application configuration.
■ Bottom-up: Based on the symptom reported, it is reasonable to guess that there
might be an issue with an Ethernet switch (perhaps with a port density of 24). Therefore,
a bottom-up approach stands a good chance of isolating the problem quickly.
■ Divide and conquer: The problem seems to be related to a block of PCs, and the
problem is probably not application related. Therefore, a divide and conquer approach
could be useful. Starting at Layer 3 (that is, the network layer), you could issue a series
of pings to determine if a next-hop gateway is reachable. If the next-hop gateway
is not reachable, you could start to troubleshoot Layer 2, checking the Cisco Catalyst
switch to which these 24 PCs are attached.
■ Following the traffic path: The symptom seems to indicate that these 24 PCs might
share a common switch. Therefore, following the traffic path to the other end of the

cabling (that is, to a switch) could prove useful. Perhaps the switch has lost power resulting
in this connectivity issue for the 24 PCs.
■ Comparing configurations: If a previous troubleshooting method (for example,
bottom-up, divide and conquer, or following the traffic path) reveals that the 24 PCs
that are not working are connected to one Cisco Catalyst switch, and the 24 PCs that
are working are connected to another Cisco Catalyst switch, comparing the configuration
of those two switches could be helpful.
■ Component swapping: Because the 24 PCs are experiencing the same problem
within a short time frame (since yesterday), it is unlikely that swapping cables would
be useful. However, if these 24 PCs connect to the same Cisco Catalyst switch, swapping
out the switch could help isolate the problem.
Using Troubleshooting Procedures
No single collection of troubleshooting procedures is capable of addressing all conceivable
network issues, because too many variables (for example, user actions) are in play.
However, having a structured troubleshooting approach can help ensure that an organization’s
troubleshooting efforts are following a similar flow, thus allowing one troubleshooter
to more efficiently take over for or assist another troubleshooter.
The previous section, “Troubleshooting Methods,” introduced a three-step troubleshooting
flow consisting of the following:
Step 1. Problem report
Step 2. Problem diagnosis
Step 3. Problem resolution
As mentioned, most troubleshooting efforts occur in the problem diagnosis step, which
can again be dissected into its subcomponents:
A. Collect information
B. Examine collected information
C. Eliminate potential causes
D. Hypothesize underlying cause
E. Verify hypothesis
By combining these components, you get the following listing of all subprocesses in the
structured troubleshooting procedure:
Step 1. Problem report
Step 2. Collect information
Step 3. Examine collected information
Step 4. Eliminate potential causes
Step 5. Hypothesize underlying cause
Step 6. Verify hypothesis
Step 7. Problem resolution
This section examines each of these subprocesses in more detail.
Problem Report
A problem report from a user often lacks sufficient detail for you to take that problem report
and move on to the next troubleshooting process (that is, collect information). For
example, a user might report, “The network is broken.” If you receive such a vague report,
you probably need to contact the user and ask him exactly what aspect of the network is
not functioning correctly.
After your interview with the user, you should be able to construct a more detailed problem
report that includes statements such as, when the user does X, she observes Y. For example,
“When the user attempts to connect to a website on the Internet, her browser
reports a 404 error. However, the user can successfully navigate to websites on her company’s
intranet.”
After you have a clear articulation of the issue, you might need to determine who is responsible
for working on the hardware or software associated with that issue. For example,
perhaps your organization has one IT group tasked with managing switches and
another IT group charged with managing routers. Therefore, as the initial point of contact,
you might need to decide whether this issue is one you are authorized to address or if you
need to forward the issue to someone else.
Collect Information
When you are in possession of a clear problem report, the next step is gathering relevant
information pertaining to the problem. Efficiently and effectively gathering information
involves focusing information gathering efforts on appropriate network entities (for example,
routers, servers, switches, or clients) from which information should be collected.
Otherwise, the troubleshooter could waste time wading through reams of irrelevant data.
Perhaps a troubleshooter is using a troubleshooting model where he follows the path of
the affected traffic, and information needs to be collected from a network device over
which he has no access. At that point, the troubleshooter might need to work with appropriate
personnel who do have access to that device. Alternatively, the troubleshooter
might switch troubleshooting models. For example, instead of following the traffic’s path,
he might swap components or use a bottom-up troubleshooting model.
Examine Collected Information
After collecting information regarding the problem report (for example, collecting output
from show or debug commands, or performing packet captures), the next structured troubleshooting
process is the analysis of the collected information.

A troubleshooter has two primary goals while examining the collected information:
■ Identify indicators pointing to the underlying cause of the problem.
■ Find evidence that can be used to eliminate potential causes.
To achieve these two goals, the troubleshooter attempts to find a balance between two
questions:
■ What is occurring on the network?
■ What should be occurring on the network?
The delta between the responses to these questions might give the troubleshooter insight
into the underlying cause of a reported problem. A challenge, however, is for the troubleshooter
to know what currently should be occurring on the network.
If the troubleshooter is experienced with the applications and protocols being examined,
she might be able to determine what is occurring on the network and how that differs
from what should be occurring. However, if the troubleshooter lacks knowledge of specific
protocol behavior, she still might be able to effectively examine her collected information
by contrasting that information with baseline data.
Baseline data might contain, for example, the output of show and debug commands issued
on routers when the network was functioning properly. By contrasting this baseline
data with data collected after a problem occurred, even an inexperienced troubleshooter
might be able to see the difference between the data sets, thus giving him a clue as to the
underlying cause of the problem under investigation. This implies that as part of a routine
network maintenance plan, baseline data should periodically be collected when the network
is functioning properly.
Eliminate Potential Causes
Following an examination of collected data, a troubleshooter can start to form conclusions
based on that data. Some conclusions might suggest a potential cause for the problem,
whereas other conclusions eliminate certain causes from consideration.
A caution to be observed when drawing conclusions is not to read more into the data than
what is actually there. As an example, a troubleshooter might reach a faulty conclusion
based on the following scenario:
A problem report indicates that PC A cannot communication with server A, as shown
in Figure 2-9. The troubleshooter is using a troubleshooting method where she follows
the path of traffic through the network. The troubleshooter examines output
from the show cdp neighbor command on routers R1 and R2. Because those routers
recognize each other as Cisco Discovery Protocol (CDP) neighbors, the troubleshooter
leaps to the conclusion that these two routers see each other as OSPF
neighbors and have mutually formed OSPF adjacencies. However, the show cdp
neighbor output is insufficient to conclude that OSPF adjacencies have been formed
between routers R1 and R2.

document can serve as a rollback plan if the implemented solution fails to resolve the
problem.
If the problem is not resolved after the troubleshooter implements the plan, or if the execution
of the plan resulted in one or more additional problems, the troubleshooter should
execute the rollback plan. After the network is returned to its previous state (that is, the
state prior to deploying the proposed solution), the troubleshooter can then reevaluate her
hypothesis.
Perhaps the troubleshooter still believes the underlying cause has been identified, even
though the original solution failed to resolve that cause. In that case, the troubleshooter
could create a different plan to address that cause. Alternatively, if the troubleshooter still
knows of other potential causes that have not yet been ruled out, he can identify which of
those causes is most likely resulting in the problem and create an action plan to resolve
that cause.
This process can repeat itself until the troubleshooter has exhausted the list of potential
causes. At that point, she might need to gather additional information or enlist the aid of a
coworker or the Cisco Technical Assistance Center (TAC).
Problem Resolution
After the reported problem is resolved, the troubleshooter should make sure the solution
becomes a documented part of the network. This implies that routine network maintenance
will maintain the implemented solution. For example, if the solution involved reconfiguring
a Cisco IOS router, a backup of that new configuration should be made part of
routine network maintenance practices.
As a final task, the troubleshooter should report the problem resolution to the appropriate
party or parties. Beyond simply notifying a user that a problem has been resolved, however,
the troubleshooter should get user confirmation that the observed symptoms are
now gone. This task confirms that the troubleshooter resolved the specific issue reported
in the problem report, rather than a tangential issue.

No comments:

Post a Comment