Troubleshooting Linux Scenarios – Part 1

Troubleshooting Linux Scenarios – Part 1

Issue 1: Unable to Start a Service

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Check if the service is installed
β”œβ”€β”€ Verify the service configuration file
β”œβ”€β”€ Check the service status using systemctl or other command
β”œβ”€β”€ Inspect the service logs for any errors
β”œβ”€β”€ Ensure there are no port conflicts
β”œβ”€β”€ Review firewall rules and SELinux settings
β”œβ”€β”€ Restart the service and check for error messages
β”œβ”€β”€ Inspect system resource usage with tools like top or htop
└── ...

Issue 2: High CPU Usage

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Identify the process causing high CPU usage using top or htop
β”œβ”€β”€ Check if the issue is intermittent or continuous
β”œβ”€β”€ Review logs for any error messages or known issues
β”œβ”€β”€ Inspect running processes and their resource consumption
β”œβ”€β”€ Investigate potential malware or unauthorized processes
β”œβ”€β”€ Consider optimizing or scaling the application
β”œβ”€β”€ Monitor system metrics over time to identify patterns
β”œβ”€β”€ Apply performance tuning based on the specific application
└── ...

Issue 3: Network Connectivity Issues Between Servers

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Check if other servers on the same network are accessible
β”œβ”€β”€ Verify firewall rules on both source and destination servers
β”œβ”€β”€ Inspect routing tables to ensure correct routes are set
β”œβ”€β”€ Use tools like traceroute or mtr to trace the network path
β”œβ”€β”€ Check for any network hardware failures or misconfigurations
β”œβ”€β”€ Review system logs for network-related errors
β”œβ”€β”€ Test connectivity using tools like telnet or nc
β”œβ”€β”€ Investigate potential DNS or hostname resolution problems
β”œβ”€β”€ Consider network segmentation or VLAN configurations
└── ...

Issue 4: Unable to Mount a Filesystem

πŸ› οΈ Approach / Solution:

bash
β”œβ”€β”€ Check if the filesystem is specified in /etc/fstab
β”œβ”€β”€ Verify the device path and UUID in /etc/fstab
β”œβ”€β”€ Ensure the filesystem type is correct
β”œβ”€β”€ Check for errors in /var/log/messages or dmesg
β”œβ”€β”€ Confirm that the device is accessible and not failing
β”œβ”€β”€ Use the mount command manually to check for errors
β”œβ”€β”€ Investigate if the filesystem needs repair (fsck)
β”œβ”€β”€ Inspect disk space on the target mount point
β”œβ”€β”€ Check for any SELinux or AppArmor restrictions
└── ...

Issue 5: Filesystem corrupted

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ One of the error that causes the system unable to BOOT UP
β”œβ”€β”€ Check /var/log/messages, dmesg, and other log files
β”œβ”€β”€ If we have bad sector logs, we have to run fsck
β”‚ β”œβ”€β”€ True:
β”‚ β”‚ β”œβ”€β”€ Reboot the system into rescue mode by booting it from CDROM by applying ISO
β”‚ β”‚ β”œβ”€β”€ Proceed with option 1, which mounts the original root filesystem under /mnt/sysimage
β”‚ β”‚ β”œβ”€β”€ Edit fstab entries or create a new file with the help of blkid and reboot
└── ...

Issue 6: Can’t cd to the directory even if the user has sudo privileges

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Reasons and Resolution
β”‚ β”œβ”€β”€ Directory does not exist
β”‚ β”œβ”€β”€ Pathname conflict: relative vs absolute path
β”‚ β”œβ”€β”€ Parent directory permission/ownership
β”‚ β”œβ”€β”€ Doesn't have executable permission on the target directory
β”‚ β”œβ”€β”€ Hidden directory
└── ...

Issue 7: Running Out of Memory

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Types
β”‚ β”œβ”€β”€ Cache (L1, L2, L3)
β”‚ β”œβ”€β”€ RAM
β”‚ β”‚ β”œβ”€β”€ Usage
β”‚ β”‚ β”‚ β”œβ”€β”€ #free -h
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Total (Total assigned memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Used (Total actual used memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Free (Actual free memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Shared (Shared Memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Buff/Cache (Pages cache memory)
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Available (Memory can be freed)
β”‚ β”‚ β”‚ β”œβ”€β”€ /proc/meminfo
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ file active
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ file inactive
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ anon active
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ anon inactive
β”‚ β”œβ”€β”€ Swap (Virtual Memory)
β”œβ”€β”€ Resolution
β”‚ β”œβ”€β”€ Identify the processes that are using high memory using top, htop, ps, etc.
β”‚ β”œβ”€β”€ Check the OOM in logs and also check if there is a memory commitment in sysctl.conf
β”‚ β”œβ”€β”€ Kill or restart the process/service
β”‚ β”œβ”€β”€ Prioritize the process using nice
β”‚ β”œβ”€β”€ Add/Extend the swap space
β”‚ β”œβ”€β”€ Add more physical more RAM
└── ...

Issue 8: Add/Extend the Swap Space

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Due to running out of memory, we would need to add more swap space
β”‚ β”œβ”€β”€ Create a file with #dd, as it will reserve the blocks of disk for the swap file
β”‚ β”œβ”€β”€ Set permission 600 and give root ownership
β”‚ β”œβ”€β”€ #mkswap
β”‚ β”œβ”€β”€ Now Turned swap on #swapon
β”‚ β”œβ”€β”€ fstab entry for persistence
└── ...

Issue 9: Unable to Run Certain Commands

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Troubleshooting and Resolution
β”‚ β”œβ”€β”€ command
β”‚ β”‚ β”œβ”€β”€ Could be the system-related command which non-root user does not have the access
β”‚ β”‚ β”œβ”€β”€ Could be the user-defined script/command
β”‚ β”œβ”€β”€ Troubleshooting
β”‚ β”‚ β”œβ”€β”€ permission/ownership of the command/script
β”‚ β”‚ β”œβ”€β”€ sudo permission
β”‚ β”‚ β”œβ”€β”€ absolute/relative path of command/script
β”‚ β”‚ β”œβ”€β”€ not defined in user $PATH variable
β”‚ β”‚ β”œβ”€β”€ command is not installed
β”‚ β”‚ β”œβ”€β”€ command library is missing or deleted
└── ...

Issue 10: System Unexpectedly reboot and process restart?

πŸ› οΈ Approach / Solution:

β”œβ”€β”€ Troubleshooting and Resolution
β”‚ β”œβ”€β”€ System reboot/crash reasons
β”‚ β”‚ β”œβ”€β”€ CPU stress
β”‚ β”‚ β”œβ”€β”€ RAM stress
β”‚ β”‚ β”œβ”€β”€ Kernel fault
β”‚ β”‚ β”œβ”€β”€ Hardware fault
β”‚ β”œβ”€β”€ Process restart
β”‚ β”‚ β”œβ”€β”€ System reboot
β”‚ β”‚ β”œβ”€β”€ Restart itself
β”‚ β”‚ β”œβ”€β”€ Watchdog application
β”‚ β”‚ β”‚ β”œβ”€β”€ To prevent high stress on system resources
β”‚ β”‚ β”‚ β”œβ”€β”€ If the application is causing stress, so it will restart or terminate
β”‚ β”œβ”€β”€ Troubleshooting
β”‚ β”‚ β”œβ”€β”€ After logged in, check the status by using commands like uptime, top, dmesg, journalctl, iostat -xz 1
β”‚ β”‚ β”œβ”€β”€ syslog.log, boot.log, dmesg, messages.log, etc
β”‚ β”‚ β”œβ”€β”€ custom log path of application
β”‚ β”‚ β”œβ”€β”€ if not completely accessible, so take the virtual console like from ILO, IDRAC, etc
β”‚ β”‚ β”œβ”€β”€ open a case and reach out a vendor
└── ...

Did you find this article valuable?

Support Prasad Suman Mohan by becoming a sponsor. Any amount is appreciated!

Β