How to analyze crash dump in RHEL ?

Today one of our server got panic couple of times. Crash dump was configured, server has generated dump couple of times. Normally we are opening case with Redhat to provide them the dump file.

I have done the same thing but I also try to analyse the crash dump from my side.

I am just sharing my experience here.

1) First checked the dump file location which we can find from /etc/kdump.conf file. For me it was /var/crash/

2) After finding the location of dump file we need to issue below command to start analyzing the dump.
crash /usr/lib/debug/lib/modules/2.6.18-371.9.1.el5/vmlinux /var/crash/127.0.0.1-2014-10-03-10\:08\:16/vmcore
It will show some messages and after that it will display useful information.

KERNEL: /usr/lib/debug/lib/modules/2.6.18-371.9.1.el5/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2014-10-03-10:08:16/vmcore
CPUS: 1
DATE: Fri Oct 3 06:08:10 2014
UPTIME: 79 days, 07:04:57
LOAD AVERAGE: 114.77, 84.16, 40.47
TASKS: 269
NODENAME: Node1
RELEASE: 2.6.18-371.9.1.el5
VERSION: #1 SMP Tue May 13 06:52:49 EDT 2014
MACHINE: x86_64 (2400 Mhz)
MEMORY: 2 GB
PANIC: “Kernel panic – not syncing: out of memory. panic_on_oom is selected
PID: 3644
COMMAND: “net_traffic”
TASK: ffff81006564e830 [THREAD_INFO: ffff81006566e000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
From above output it’s evident that its got panic due to OOM (Out of memory issue).

3) We will be at a new prompt. We can issue multiple commands here to do dump file analysis. First command we can issue to check the status of memory utilization at panic time because in previous it shows us that it was out of memory.

crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 481923 1.8 GB —-
FREE 2153 8.4 MB 0% of TOTAL MEM
USED 479770 1.8 GB 99% of TOTAL MEM
SHARED 9242 36.1 MB 1% of TOTAL MEM
BUFFERS 27 108 KB 0% of TOTAL MEM
CACHED 1151 4.5 MB 0% of TOTAL MEM
SLAB 7286 28.5 MB 1% of TOTAL MEM

TOTAL HIGH 0 0 0% of TOTAL MEM
FREE HIGH 0 0 0% of TOTAL HIGH
TOTAL LOW 481923 1.8 GB 100% of TOTAL MEM
FREE LOW 2153 8.4 MB 0% of TOTAL LOW

TOTAL SWAP 262142 1024 MB —-
SWAP USED 262142 1024 MB 100% of TOTAL SWAP
SWAP FREE 0 0 0% of TOTAL SWAP

It is showing 100% utilization of memory and SWAP.

4) We can check the culprit process which was taking high memory on server.

crash> ps

you can issue above command to see the processes running at that time. We can identify the process by manually looking at output which was taking high memory.

In my case it was httpd there was lot of instances running on that process. Then I used below command to sumup the usage.

crash> ps -Gu httpd| tail -n +2 | cut -b2- | gawk ‘{mem += $8} END {print “total ” mem/1048576 ” GB”}’
total 1.8444 GB
Out of 2 GB httpd process was using 1.8GB I provided recommendation to customer to increase the memory on server.

Tip : We can provide the dump to redhat quickly by using the below method. you server should have the internet connectivity for it. It will help us to avoid using winscp and then uploading it.

curl -T /var/crash/127.0.0.1-2014-10-03-10\:08\:16/vmcore ftp://dropbox.redhat.com/incoming/<casenumber>-vmcore
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 471M 0 0 100 471M 0 21.1M 0:00:22 0:00:22 –:–:– 29.0M

Advertisements

One thought on “How to analyze crash dump in RHEL ?

  1. Pingback: How to analyze crash dump in RHEL ? | operation CDL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s