Troubleshooting and Debugging Techniques https://www.coursera.org/learn/troubleshooting-debugging-techniques
Troubleshooting and Debugging Techniques
https://www.coursera.org/learn/troubleshooting-debugging-techniques
Main Area of Computer trouble shooting:
Hardware
RAM , HARD DISK, CPU , NETWORK
Software:
OS bugs To see Log files , what is happening
Application Bugs.
RAM leaks,
Network Bandwidth Problem: to with Shapping give Priorities.
Some Linux/Python commands
$ top to see the memory is being used in the system
In widows
1 , "Event Viewer" to see Log file
2. Performance Monitor
3. Resource Monitor.
$ lsof list of open files.
$ sudo lsof | grep deleted && means list of opened files marked as <Deleted>
$ nice command to set priorities of application
Module 3
Crashes in Complex Systems
Complex contains many Servers and many Services Like your company have E-Commerce Website having many Servers
Find out / check Log files of
1. Specific Services
2. General system Log files
Find out change done when system was in good state and in Bad state.
1. Was we upgrade any software
2. add any Hardware
Updated any new service like authentcation server, any Database service or Database version.
OR changes done in other Back end servers like billing, inventory and procurment sytem
You found that there is some changes made in LOAD BALANCE in b/w front-end and Back-end services
SOLUTION is to Roll Back the services
For dealing with such type complex system , you must
1. Check Log files
2. Have good Monitoring System
3. Use VCS like Git and GitHu.com , so you cane quickly Roll Back when needed
Module 3
Writing Effective Postmortems
Postmortem is documentation of miskates we done during the touble shooting , or may during the cration of problem/bug or any other MISBAT
Postmorterm is doc: what we learn from out mistakes
TO prevent the same issue occur again.The problem was solve by ROLL BACKING to previous state.
Module 3
Practice Quiz: Handling Bigger Incidents
Congratulations! You passed!
1.
Which of the following would be effective in resolving a large issue if it happens again in the future?
Keep it up! A postmortem is a detailed document of an issue which includes the root cause and remediation. It is effective on large, complex issues.
2.
During peak hours, users have reported issues connecting to a website. The website is hosted by two load balancing servers in the cloud and are connected to an external SQL database. Logs on both servers show an increase in CPU and RAM usage. What may be the most effective way to resolve this issue with a complex set of servers?
You got it! Automatically deploying additional servers to handle the loads of requests during peak hours can resolve issues with a complex set of servers.
3.
It has become increasingly common to use cloud services and virtualization. Which kind of fix, in particular, does virtual cloud deployment speed up and simplify?
Right on! Virtualization makes deployment of VM servers in the cloud a fast and relatively simple process.
4.
What should we include in our postmortem? (Check all that apply)
Sweet! In order to learn about the problem and how it happens in general, we should include what caused it this time.
Awesome! By clarifying how we identified the problem, it can be more easily identified in the future.
Excellent! In order to share with reviewers how the issue was resolved, it's important to include what we did to solve it this time.
5.
In general, what is the goal of a postmortem? (Check all that apply)
Way to go! By describing the cause of the problem, we can learn to avoid the same circumstances in the future.
Woohoo! By describing in detail how we fixed the problem, we can help others or ourselves fix the same problem more quickly in the future.
Module 4
Getting to the Important Tasks
Time Optimization is Hard task
Splits tasks in two categories
1. Urgent/Not Urgent
2. Important / Not Important
Important and Urgent Example: Internet of company down. You to restore internet connection from Backup as soon as possible ASAP
2. Example of Important but not Urgent is Long Term Plannings, like: Researching new technologies, Planning RollBack systems (Alayee Chaa)changing whole coding from FoxPro to Python and Implementation of DBMS system like MySQL instead of .dbf files
3. Example of Ugrent But Not important:
a. Answer emails
b. Phone calls
c.
4. Example of Not urgent and Not important
a. Fazool Meetings
b.
Technical Debt: (Loan)
All Solutions we done urgently , temporary, in emergency on Adhoc Basis, we do work around and even it is not the best solution , To be solved on Long Terms and Permanent solution, Long Term Remediations (Not sure, Alaye Chaa)
It also Tech: debt when new version of software released , but still you did not change , due to you have not time right now , to change it. Or the online, current users can not be distrubed.
Question
Module 4 Prioritizing Tasks
Question
Module 4
Estimating the Time Tasks Will Take
Question
Module 4
Communicating Expectations
Replacing a fualty keyboard OR preparating New computer for New employee.
We must communicate/educate the user how much time it will take to solve his problem. If their problem is solve before they expect he will be very happy otherwise he will be frusted.
Also make priority of working , For example there is a problem in Database then it will affect the company, So High priority is to given to this problem instead of the problem which affects only one or two persons
Receive any problem must be in REPORT format , instead of Phone call or chat. So you can see the list of issues, instead of users distrub you in the middle of the task
To avoid frustations and to save the time there must be some pole new keyboard and new mice, from company. By trusting the user they themself come and change the mouse and keyboard , if there old Keyboard/mouse is not working properly.
In the same way there must be some extra new computer systems, So that in case of any hardware fault occur, we must change/replace the faulty computer with new one immediately to save the time and frustration.
Then we try to repair/trouble shoot the faulty computer
Question
Module 4
More About Making the Best Use of Our Time
Check out the following link for more information:
Practice Quiz: Managing Our Time
1.
Using the Eisenhower Decision Matrix, which of the following is an example of an event or task that is both Important, and Urgent?
Great work! It’s important for users to have Internet to work, and it must be resolved right away.
2.
You’re working on a web server issue that’s preventing all users from accessing the site. You then receive a call from user to reset their user account password. Which appropriate action should you take when prioritizing your tasks?
Nice job! Ask the user to open a support ticket so that the request can be placed into the queue while you work on the most urgent issue at hand.
3.
What is it called when we make more work for ourselves later by taking shortcuts now?
Right on! Technical debt is defined as the implied cost of additional rework caused by choosing an easy (limited) solution now instead of using a better, but more difficult, solution.
4.
What is the first step of prioritizing our time properly?
Awesome! Before we can even decide which task to do first, we need to make a list of our tasks.
5.
If an issue isn't solved within the time estimate that you provided, what should you do? (Select all that apply)
Nice work! Communication is key, and it’s best to keep everyone informed.
Great work! If your original estimate turned out to be overly optimistic, it’s appropriate to re-estimate.
Module 4
Dealing with Hard Problems
Quotation from Brain who is contributor of UNIX OS and Author C programming Book
MEANS:
Writting new program is Easier than Debugging an old program
First Advice during writting codes
Another advice during writting Codes
Before writting , document final goal of program/Application
Advice No 3:
Before writting Code try to write TEST Codes.
If the problem is not solving , then explain the problem to rubber duck, which is know as rubber duck debug
Question
Module 4
Proactive Practices
Before implementing any software/application in production environment we must run
1. Automatic Tests and
2. Manual Tests
In a testing environment.
It means for deploying new software you must not apply/deploy in all the computers at a time, But at first instent apply/deploy one few computers at first instant.
So to see logs files we do not need to see in each computer, instead of one place in one computer.
It can be super helpful.
We must keep DOCUMENTATION at one and easily accessable place like in Google PlayBooks
Question
Module 4
Planning Future Resource Usage
Question
Module 4
Preventing Future Problems
Implement the monitoring system. For
RAM
CPU
DISK storage
and
Network Bandwidth
E.g. If there 85% full , Monitoring system must generates ALERTS/WARNING.
Also report the developers , what did you do for work around.
Specify full report , what was bug, when it was occur...etc every thing you know about the problems and its temporary/Adhoc solutions
Question
Check out some more info here:
https://simpleprogrammer.com/understanding-the-problem-domain-is-the-hardest-part-of-programming/
https://deploy.equinix.com/blog/explaining-failure-domains-sre-lifeblood/
https://landing.google.com/sre/sre-book/chapters/effective-troubleshooting/
Practice Quiz: Making Our Future Lives Easier
1.
Which proactive practice can you implement to make troubleshooting issues in a program easier when they happen again, or face other similar issues?
You got it! Documentation that includes good instructions on how to resolve an issue can assist in resolving the same, or similar issue in the future.
2.
Which of the following is a good example of mixing and matching resources on a single server so that the running services make the best possible use of all resources?
Great work! An application that uses a lot of RAM can still run while CPU is mostly used by another application on the same server.
3.
One strategy for debugging involves explaining the problem to yourself out loud. What is this technique known as?
Right on! Rubber ducking is the process of explaining a problem to a "rubber duck", or rather yourself, to better understand the problem.
4.
When deploying software, what is a canary?
Nice job! Reminiscent of the old term "canary in a coal mine", a canary is a test deployment of our software, just to see what happens.
5.
It is advisable to collect monitoring information into a central location. Given the importance of the server handling the centralized collecting, when assessing risks from outages, this server could be described as what?
Awesome! A failure domain is a logical or physical component of a system that might fail.
Module 4 Wrap Up: Managing Resources
Discussed what we learned in this course. Listed Topics.
Qwiklabs Assessment: Debugging and Solving Software Problems
===Existing Code===
import csv
import datetime
import requests
URL_FILE='https://storage.googleapis.com/gwg-content/gic215/employees-with-date.csv'
def get_start_date():
year = int(input('Enter Year====> ') or 2019)
month = int(input('Enter Month===> ') or 1)
day = int(input('Enter day=====> ') or 1)
return dateitme.datetime(year,month,day)
Module 4
Congratulations!
Module 4
Sneak Peek of Next Course
Comments
Post a Comment