Category Archives: Solaris

How to check physical disk location on Solaris 10 X86 x4500/x4540 HWs ?

In this article I am going to show you the usage of SUNWhd tool to get the disk status. This is inbuilt tool which is available on X4500/X4540 HWs. I have used this on X4540 HW.

Step 1 : Went to below path to issue the commands.
bash-3.2# pwd
/opt/SUNWhd/hd/bin

bash-3.2# ls
hd hd.html hdadm hdadm.html read_cache write_cache

Step 2 : Issue command to check the location of failed disk.

bash-3.2# /opt/SUNWhd/hd/bin/hd -c

platform = Sun Fire X4540

ScsiIo failed, Check Condition, Key = 2, ASC/ASCQ = 4Ch/00h

Device Serial Vendor Model Rev Temperature
—— —— —— —– —- ———–
c0t1d0p0 5QE5T3L8 ATA SEAGATE ST32500N 3AZQ 34 C (93 F)
c0t6d0p0 5QE5T4HH ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c5t3d0p0 5QE5T4SN ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c5t4d0p0 5QE5T4ML ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c2t0d0p0 5QE5T52S ATA SEAGATE ST32500N 3AZQ 29 C (84 F)
c2t7d0p0 5QE5T3M0 ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c1t1d0p0 5QE5T4VQ ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c1t6d0p0 5QE5T3N6 ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c4t3d0p0 5QE5T4QE ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c4t4d0p0 5QE5T3PG ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c3t0d0p0 5QE5T57S ATA SEAGATE ST32500N 3AZQ 30 C (86 F)
c3t7d0p0 5QE5T4W5 ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c2t6d0p0 5QE5T4Z4 ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c2t1d0p0 5QE5T4LK ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c5t2d0p0 5QE5T54W ATA SEAGATE ST32500N 3AZQ 34 C (93 F)
c0t7d0p0 5QE5T3VN ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c0t0d0p0 5QE5T3N0 ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c3t6d0p0 5QE5T3RD ATA SEAGATE ST32500N 3AZQ 32 C (89 F)
c3t1d0p0 5QE5T58R ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c4t5d0p0 5QE5RZL5 ATA SEAGATE ST32500N 3AZQ 32 C (89 F)
c4t2d0p0 5QE5T4VB ATA SEAGATE ST32500N 3AZQ 32 C (89 F)
c1t7d0p0 5QE5T4R5 ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c1t0d0p0 5QE5T3WT ATA SEAGATE ST32500N 3AZQ 30 C (86 F)
c3t2d0p0 5QE5T4R3 ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c3t5d0p0 5QE5T4ZK ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c4t1d0p0 5QE5T53R ATA SEAGATE ST32500N 3AZQ 31 C (87 F)
c4t6d0p0 5QE5T4XT ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c1t3d0p0 5QE5T4XD ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c1t4d0p0 5QE5T4RD ATA SEAGATE ST32500N 3AZQ 29 C (84 F)
c2t2d0p0 5QE5T4HD ATA SEAGATE ST32500N 3AZQ 32 C (89 F)
c2t5d0p0 5QE5T3SX ATA SEAGATE ST32500N 3AZQ 30 C (86 F)
c5t1d0p0 5QE5T3RW ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c5t6d0p0 5QE5T4X0 ATA SEAGATE ST32500N 3AZQ 34 C (93 F)
c0t3d0p0 5QE5T4RW ATA SEAGATE ST32500N 3AZQ 36 C (96 F)
c0t4d0p0 5QE5T3Y5 ATA SEAGATE ST32500N 3AZQ 30 C (86 F)
c1t2d0p0 5QE5T3Y9 ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c4t7d0p0 5QE5T4XS ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c4t0d0p0 5QE5T536 ATA SEAGATE ST32500N 3AZQ 29 C (84 F)
c3t4d0p0 5QE5T3SB ATA SEAGATE ST32500N 3AZQ 28 C (82 F)
c3t3d0p0 5QE5T3TJ ATA SEAGATE ST32500N 3AZQ 34 C (93 F)
c0t5d0p0 5QE5T3XN ATA SEAGATE ST32500N 3AZQ 33 C (91 F)
c0t2d0p0 5QE5T4L1 ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c5t7d0p0 5QE5T4ZB ATA SEAGATE ST32500N 3AZQ 35 C (95 F)
c5t0d0p0 5QE5T4XB ATA SEAGATE ST32500N 3AZQ 30 C (86 F)
c2t4d0p0 5QE5T3X5 ATA SEAGATE ST32500N 3AZQ 29 C (84 F)
c2t3d0p0 5QE5T3W3 ATA SEAGATE ST32500N 3AZQ 34 C (93 F)

—————————–SunFire X4540——-Rear—————–
3: 7: 11: 15: 19: 23: 27: 31: 35: 39: 43: 47:
c0t3 c0t7 c1t3 c1t7 c2t3 c2t7 c3t3 c3t7 c4t3 c4t7 c5t3 c5t7
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
2: 6: 10: 14: 18: 22: 26: 30: 34: 38: 42: 46:
c0t2 c0t6 c1t2 c1t6 c2t2 c2t6 c3t2 c3t6 c4t2 c4t6 c5t2 c5t6
^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
1: 5: 9: 13: 17: 21: 25: 29: 33: 37: 41: 45:
c0t1 c0t5 c1t1 c1t5 c2t1 c2t5 c3t1 c3t5 c4t1 c4t5 c5t1 c5t5
^b+ ^++ ^b+ ^– ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^–
0: 4: 8: 12: 16: 20: 24: 28: 32: 36: 40: 44:
c0t0 c0t4 c1t0 c1t4 c2t0 c2t4 c3t0 c3t4 c4t0 c4t4 c5t0 c5t4
^b+ ^++ ^b+ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++ ^++
——-*———*———–SunFire X4540—*—Front—–*——-*—

In the above output you will not be able to differentiate between failed and healthy disk. But when you will be working you will see the failed disk in red colour as shown below.

failed disk snapshot

Step 3 : Just showing you the more power of this utility by extracting information for particular working disk.

bash-3.2# ./hd -e c1t1
ScsiIo failed, Check Condition, Key = 2, ASC/ASCQ = 4Ch/00h
Revision: 10
Offline status 130
Selftest status 0
Seconds to collect 430
Time in minutes to run short selftest 1
Time in minutes to run extended selftest 92
Offline capability 91
SMART capability 3
Error logging capability 1
Checksum 0x29
Identification Status Current Worst Raw data
1 Raw read error rate 0xf 100 253 0
3 Spin up time 0x3 96 96 0
4 Start/Stop count 0x32 100 100 62
5 Reallocated sector count 0x33 100 100 0
7 Seek error rate 0xf 84 60 270825655
9 Power on hours count 0x32 48 48 46269
10 Spin retry count 0x13 100 100 0
12 Device power cycle count 0x32 100 100 62
187 Uncorrectable Errors for Host 0x32 100 100 0
189 High Fly Writes 0x3a 100 100 0
190 Airflow Temperature (WDC) 0x22 67 56 605421601
194 Temperature 0x22 33 44 33/ 0/ 21 (degrees C cur/min/max)
195 Hardware ECC Recovered 0x1a 69 60 141013833
197 Current pending sector count 0x12 100 100 0
198 Scan uncorrected sector count 0x10 100 100 0
199 Ultra DMA CRC error count 0x3e 200 200 0
200 Write/Multi-Zone Error Rate 0x0 100 253 0
202 Data Address Mark errors 0x32 100 253 0
But we will not be able to retrieve the information for problematic disk.

bash-3.2# ./hd -e c1t5
ScsiIo failed, Check Condition, Key = 2, ASC/ASCQ = 4Ch/00h
can’t access c1t5 [/dev/rdsk/c1t5d0p0]
After the disk replacement disks will be shown in healthy state.

after disk replacement

Reference : Oracle Doc ID 1565521.1 (Need Oracle portal credentials to access it)

Really Amazing. Wonderful utility!! isn’t it 🙂

Advertisements

How to investigate transition service issues offline* in Solaris 10 ?

Today I got one issue in which customer was not able to login into one of Solaris non-global zone.

I logged in into non-global zone. I checked and found that ssh service is not running in non-global zone.

-bash-3.2# svcs ssh
STATE STIME FMRI
offline Oct_07 svc:/network/ssh:default

I thought its cakewalk I tried to start the service.

-bash-3.2# svcadm enable ssh

But still it was in disabled state.

-bash-3.2# svcs ssh
STATE STIME FMRI
offline Oct_07 svc:/network/ssh:default

I checked the status of dependent services. Found that one service is in offline state. I tried to bring up that one but it was not coming up.

-bash-3.2# svcs -d /network/ssh
STATE STIME FMRI
disabled Oct_07 svc:/system/filesystem/autofs:default
online Oct_07 svc:/network/loopback:default
online Oct_07 svc:/network/physical:default
online Oct_07 svc:/system/cryptosvc:default
online Oct_07 svc:/system/filesystem/local:default
offline Oct_07 svc:/system/utmp:default

I checked the dependency of utmp service and found that its dependency is also offline that was also not coming up.

-bash-3.2# svcs -d /system/utmp
STATE STIME FMRI
offline Oct_07 svc:/milestone/sysconfig:default
I checked the dependency of sysconfig service and finally found one cluprit service is in transition state that is the problematic one.

-bash-3.2# svcs -d /milestone/sysconfig
STATE STIME FMRI
online Oct_07 svc:/milestone/single-user:default
offline Oct_07 svc:/system/sysidtool:system
offline* Oct_07 svc:/system/sysidtool:net

You can’t bring the offline* state into online state.

I checked the processes which are using that service.

-bash-3.2# svcs -p /system/sysidtool:net
STATE STIME FMRI
offline* Oct_07 svc:/system/sysidtool:net
Oct_07 3421 sysidtool-net
Oct_07 3454 sysidnet
Oct_07 3648 sysidnet

After that I took the console of non-global zone. It was showing me wizard I answered that.

[root@Node1 /]# zlogin -C non-globalzone1
[Connected to zone ‘non-globalzone1’ console]
You did not enter a selection.
What type of terminal are you using?
1) ANSI Standard CRT
2) DEC VT52
3) DEC VT100
4) Heathkit 19
5) Lear Siegler ADM31
6) PC Console
7) Sun Command Tool
8) Sun Workstation
9) Televideo 910
10) Televideo 925
11) Wyse Model 50
12) X Terminal Emulator (xterms)
13) CDE Terminal Emulator (dtterm)
14) Other
Type the number of your choice and press Return: 13
I came out of non-global zone using ~.
non-globalzone1 console login: ~.
[Connection to zone ‘non-globalzone1’ console closed]

I checked the status of service by logging into non-global zone and found that its in online state. If that is in online state services on which it depend should also be in healthy state 🙂

-bash-3.2# svcs ssh
STATE STIME FMRI
online 23:38:17 svc:/network/ssh:default

How to get rid of error while deleting zfs snapshots ?

Today, I was working on issue where my ZFS file systems were full I was supposed to clear some space from it. I found that snapshots were created for file system and has not been deleted for long time.

I have simulated the same case in my Lab environment.

My Lab Set up :

  • Solaris 10 64 bit
  • pool name pool1
  • file system zfs1, zfs2.

Step 1 : I have created pool1 on one disk.

bash-3.2# zpool status pool1
pool: pool1
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0

Step 2 : Created file system using that pool1.

bash-3.2# zfs create pool1/zfs1

bash-3.2# zfs list -r pool1/zfs1
NAME USED AVAIL REFER MOUNTPOINT
pool1/zfs1 31K 976M 31K /pool1/zfs1

bash-3.2# cd /pool1/zfs1

bash-3.2# touch file1 file2

bash-3.2# ls
file1 file2

Step 3 : I created snapshot of file system (zfs1). Snapshot name I have given is snap1.

bash-3.2# zfs snapshot pool1/zfs1@snap1

bash-3.2# zfs list -r pool1
NAME USED AVAIL REFER MOUNTPOINT
pool1 1.14M 975M 32K /pool1
pool1/zfs1 1.04M 975M 1.04M /pool1/zfs1
pool1/zfs1@snap1 0 – 1.04M –

Step 4 : I created another file system named zfs2 from snapshot created in previous step. 

bash-3.2# zfs clone pool1/zfs1@snap1 pool1/zfs2

I issued below command to check the origin of file systems. Here I can see that zfs1 has been manually created hence showing origin nothing but zfs2 has been created from snapshot hence showing origin as snapshot.

1

Step 5 : I went to created file system in previous step and created two more files. Now along with file3 and file4, file1, file2 will be also present in that file system which are created in step 2. I hope you get that point. 

bash-3.2# cd /pool1/zfs2

bash-3.2# touch file3

bash-3.2# touch file4

bash-3.2# ls

file1 file2 file3 file4

Step 6 : Now if I am trying to delete the snapshot it is giving me the error because we have created one file system from snapshot.

bash-3.2# zfs destroy pool1/zfs1@snap1
cannot destroy ‘pool1/zfs1@snap1’: snapshot has dependent clones
use ‘-R’ to destroy the following datasets:
pool1/zfs2

To get rid of the above error we need to promote the file system. Compare below output with step 4 output.

bash-3.2# zfs promote pool1/zfs2

2 promote

Step 7 : Again trying to delete the file system still getting error. We have to delete one file system either it could be the clone file system (zfs2) which we have created from snapshot or the original file system (zfs1) which we have created manually.

bash-3.2# zfs destroy pool1/zfs2@snap1
cannot destroy ‘pool1/zfs2@snap1’: snapshot has dependent clones
use ‘-R’ to destroy the following datasets:
pool1/zfs1

Used below command to delete the snapshot along with zfs1.

bash-3.2# zfs destroy -R pool1/zfs2@snap1

Now we are left with zfs2 only.

3

What will happen your data. You data will remain in zfs2 file system. If you have changed anything in zfs1 file system after creating the snapshot in Step 3 then you will loose that data.

bash-3.2# pwd
/pool1/zfs2
bash-3.2# ls
file1 file2 file3 file4
References:
Oracle Doc ID 1552965.1 (You need oracle credentials to access the doc).

Learning Oracle ZFS Storage using Appliance simulator

In OVM environment maximum customers are using zfs storage.

If you guys are not familiar with GUI interface of zfs storage then no worries. You can download the simulator which provides the same kind of experience as the original zfs storage interface.

Below is the link to download.

http://www.oracle.com/webapps/dialogue/ns/dlgwelcome.jsp?p_ext=Y&p_dlg_id=10521841&src=7299332&Act=45

You may need the Oracle portal credential to download it.

After importing this template into OVB (Oracle Virtual Box) you can assign IP address to it. Then put the same address in browser to start working on it.

How to schedule consistency check in MegaRaid.

I was supposed to run the consistency check on volume in MegaRAID. Be aware that running consistency check can lead to high I/O waits.

Step 1 : Before doing any thing I checked the current settings for consistency check.

[root@Node1 /opt/MegaRAID/CLI]# ./MegaCli -AdpCcSched -Info -aALL

Adapter #0

Operation Mode: Concurrent
Execution Delay: 168
Next start time: 08/23/2014, 03:00:00
Current State: Stopped
Number of iterations: 174
Number of VD completed: 0
Excluded VDs : None
Exit Code: 0x00 

Step 2 : Before doing the date modification I confirmed that MegaRAID and my Solaris server are taking the same time zone.

[root@Node1 /opt/MegaRAID/CLI]# date
Fri Aug 22 03:14:23 EDT 2014 

[root@Node1 /opt/MegaRAID/CLI]# ./MegaCli -AdpGetTime aALL

Adapter 0:
Date: 08/22/2014
Time: 03:14:14

Exit Code: 0x00 

Step 3 : After that I schedule it to run at 4:00 AM EDT date format is YYYYMMDD. It works in 24 Hours Clock if you want to schedule it for evening 17:00 EDT i.e 5:00 PM EDT

[root@Node1 /opt/MegaRAID/CLI]# ./MegaCli -AdpCcSched -SetStartTime 20140822 04 -aALL

Adapter 0: Scheduled CC start time is set.

Exit Code: 0x00
[root@Node1 /opt/MegaRAID/CLI]# ./MegaCli -AdpCcSched -Info -aALL

Adapter #0

Operation Mode: Concurrent
Execution Delay: 168
Next start time: 08/22/2014, 04:00:00
Current State: Stopped
Number of iterations: 174
Number of VD completed: 0
Excluded VDs : None
Exit Code: 0x00 

Step 4 : At schedule time I verified that it started running.

During Running phase:

[root@Node1 /opt/MegaRAID/CLI]# ./MegaCli -AdpCcSched -Info -aALL

Adapter #0

Operation Mode: Concurrent
Execution Delay: 168
Next start time: 08/22/2014, 04:00:00
Current State: Active
Number of iterations: 174
Number of VD completed: 0
Excluded VDs : None
Exit Code: 0x00 

Check got completed Successfully.

[root@Node1 /opt/MegaRAID/CLI]# ./MegaCli -AdpCcSched -Info -aALL

Adapter #0

Operation Mode: Concurrent
Execution Delay: 168
Next start time: 08/29/2014, 04:00:00
Current State: Stopped
Number of iterations: 175
Number of VD completed: 1
Excluded VDs : None
Exit Code: 0x00 

In above output it is showing me that it has checked the one virtual drive and as per my configuration one virtual drive was created on 2 physical drives. 

Verified the status of Virtual Drives using “./MegaCli -LDInfo -Lall -aALL | grep -i State”  all showing Optimal 🙂 

One more check regarding details of correction can be done with below command

cd /opt/MegaRAID/CLI
./MegaCli -fwtermlog -dsply -a0 -nolog > /tmp/lsi-fwterm.log

Snippet of the content of /tmp/lsi-fwterm.log

08/22/14 5:16:01: EVT#615932-08/22/14 5:16:01: 59=Consistency Check done with corrections on VD 00/0, (corrections=65535)
08/22/14 5:16:01: ccScheduleSetNextStartTime: RTC_TimeStamp=1b898e91, nextStartTime=1b92b740
08/22/14 5:16:01: Next cc scheduled to start at 08/29/14 4:00:00
08/22/14 5:16:01: CC Schedule cycle complete

Adding Resource to zone cluster

In my previous post https://ervikrant06.wordpress.com/2014/08/08/creating-zone-cluster-inside-sun-cluster/, I have configured the zone cluster. Now time has come to add the resource in cluster to verify the status of zone cluster.

Step 1 : I have create pool on shared storage (Shared b/w SolNode1 and SolNode2) and sharing that pool in cluster zone as resource to check the fail over operation whether its working fine. Need to be done on one node only.

SolNode2:> zpool create zoneSHpool1 c2t3d0

Step 2 : Added a dataset zoneSHpool1  as resource in zonecluster configuration.  Need do it on one node only.

clzc configure ZC1
clzc:ZC1> add dataset
clzc:ZC1:dataset> set name=zoneSHpool1
clzc:ZC1:dataset> end
clzc:ZC1> verify
clzc:ZC1> commit
clzc:ZC1> exit

Step 3 : Adding a resource group in zone cluster. Need to do it on one node only.

SolNode2:> clrg create -Z ZC1 -n node1zone1,node2zone1 zoneRG1

Step 4 : Adding a resource to zone cluster. Need to do it on one node only.

SolNode2:> clrs create -g zoneRG1 -t SUNW.HAStoragePlus -p zpools=zoneSHpool1 -Z ZC1 zoneRS1

Step 5 : After adding the resource we can check the status. Its showing in offline. Lets bring it in manage and online state.

SolNode2:> clrs status -Z ZC1

=== Cluster Resources ===

Resource Name Node Name State Status Message
————- ——— —– ————–
zoneRS1 node1zone1 Offline Offline
node2zone1 Offline Offline

SolNode2:> clrg manage -Z ZC1 zoneRG1

SolNode2:> clrg online -Z ZC1 zoneRG1

SolNode2:> clrg status -Z ZC1

=== Cluster Resource Groups ===

Group Name Node Name Suspended Status
———- ——— ——— ——
ZC1:zoneRG1 node1zone1 No Online
node2zone1 No Offline

SolNode2:> clrs status -Z ZC1

=== Cluster Resources ===

Resource Name Node Name State Status Message
————- ——— —– ————–
zoneRS1 node1zone1 Online Online
node2zone1 Offline Offline

Step 6 : After that I rebooted the zone cluster to reflect the changes inside the zones. Need to do it on one node only.

SolNode2:> clzc reboot ZC1
Waiting for zone reboot commands to complete on all the nodes of the zone cluster “ZC1″…

Step 7 : Login into cluster zone to check the status of assigned pool.

SolNode2:> zlogin ZC1
[Connected to zone ‘ZC1’ pts/2]
Last login: Wed Aug 6 14:55:57 on pts/2
Oracle Corporation SunOS 5.10 Generic Patch January 2005

# uname -a
SunOS node2zone1 5.10 Generic_147148-26 i86pc i386 i86pc

# zpool list
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
zoneSHpool1 1008M 120K 1008M 0% ONLINE /zonepool1/ZC1/root

 Once again now current status at Global zone level.

SolNode2:> clrs status -Z ZC1

=== Cluster Resources ===

Resource Name Node Name State Status Message
————- ——— —– ————–
zoneRS1 node1zone1 Offline Offline
node2zone1 Online Online

Step 8 : I issued poweroff command by logging into zone to check whether fail over is occurring successfully or not.

After that status of resource in zone cluster become as below.

SolNode2:> clrs status -Z ZC1

=== Cluster Resources ===

Resource Name Node Name State Status Message
————- ——— —– ————–
zoneRS1 node1zone1 Online Online
node2zone1 Offline Offline

Went to another node SolNode1 to check the status whether shared pool came into this node or not.

SolNode1:> zlogin ZC1
[Connected to zone ‘ZC1’ pts/2]
Last login: Wed Aug 6 15:13:26 on pts/2
Oracle Corporation SunOS 5.10 Generic Patch January 2005

# df -h /zoneSHpool1
Filesystem size used avail capacity Mounted on
zoneSHpool1 976M 31K 976M 1% /zoneSHpool1

Current zone cluster status.Now I can power on the non-global zone on SolNode2. Because our verification was successful 🙂

SolNode1:> clzc status

=== Zone Clusters ===

— Zone Cluster Status —

Name Node Name Zone Host Name Status Zone Status
—- ——— ————– —— ———–
ZC1 SolNode1 node1zone1 Online Running
SolNode2 node2zone1 Offline Installed

Step 9 : Bringing back the zone on SolNode2 up using below command.

SolNode2:> clzc boot ZC1

SolNode2:> zlogin ZC1
[Connected to zone ‘ZC1’ pts/2]
Last login: Wed Aug 6 15:10:55 on pts/2
Oracle Corporation SunOS 5.10 Generic Patch January 2005
# uptime
3:16pm up 1 user, load average: 0.66, 0.18, 0.18

Creating Zone Cluster inside Sun cluster

Today I created zone cluster in Sun cluster. To configure zone cluster we need to have cluster configured in Global zone. In my previous post(https://ervikrant06.wordpress.com/2014/08/07/how-to-configure-resource-in-sun-cluster/) I have already configured Sun Cluster and added the resource in cluster to verify the functionality of cluster.

Step 1 : Creating zpool on disk which is present on Node SolNode1. We need to do this on both nodes. Means this is not shared disk. It needs to be presented to each node.

Note: This is not shared Disk.

SolNode1:> zpool create zonepool1 c2t2d0

Step 2 : We need to configure the zone cluster using below command. Pool created in previous steps is used as root for zone. Need to do it on one node only.

SolNode1:> clzc configure ZC1
ZC1: No such zone cluster configured
Use ‘create’ to begin configuring a new zone cluster.
clzc:ZC1> create
clzc:ZC1> set zonepath=/zonepool1/ZC1
clzc:ZC1> set autoboot=true
clzc:ZC1> add node
clzc:ZC1:node> set physical-host=SolNode1
clzc:ZC1:node> set hostname=node1zone1
clzc:ZC1:node> add net
clzc:ZC1:node:net> set address=192.168.56.20
clzc:ZC1:node:net> set physical=e1000g3
clzc:ZC1:node:net> end
clzc:ZC1:node> end
clzc:ZC1> add node
clzc:ZC1:node> set physical-host=SolNode2
clzc:ZC1:node> set hostname=node2zone1
clzc:ZC1:node> add net
clzc:ZC1:node:net> set address=192.168.56.21
clzc:ZC1:node:net> set physical=e1000g3
clzc:ZC1:node:net> end
clzc:ZC1:node> end
clzc:ZC1> add sysid
clzc:ZC1:sysid> set root_password=0vcY0fwbKta2U
clzc:ZC1:sysid> end
clzc:ZC1> verify
clzc:ZC1> commit
clzc:ZC1> exit

Step 3 : After the configuration is completed we need to start the installation of zone cluster. I have created the sparse zone. Need to do it on one node only.

SolNode1:> clzc install ZC1
Waiting for zone install commands to complete on all the nodes of the zone clust er “ZC1″…

Step 4: After the completion of command in Step 3. We can check the status of zone cluster on both nodes.

SolNode2:> clzc status

=== Zone Clusters ===

— Zone Cluster Status —

Name Node Name Zone Host Name Status Zone Status
—- ——— ————– —— ———–
ZC1 SolNode1 node1zone1 Offline Incomplete
SolNode2 node2zone1 Offline Incomplete

Step 5: Now tried to boot the zone cluster. Need to do it on one node only.

SolNode1:> clzc boot ZC1
Waiting for zone boot commands to complete on all the nodes of the zone cluster “ZC1″…

SolNode2:> clzc status

=== Zone Clusters ===

— Zone Cluster Status —

Name Node Name Zone Host Name Status Zone Status
—- ——— ————– —— ———–
ZC1 SolNode1 node1zone1 Offline Running
SolNode2 node2zone1 Offline Running

Step 6 : Zone cluster is in running but status is still offline. To make them online we have to login in each node of cluster and login into console of zonecluster to assign the name to non-global zone which we used in Step 2 during configuration (node1zone1 and node2zone1)

SolNode1:> zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 ZC1 running /zonepool1/ZC1 cluster shared

SolNode1:> zlogin -C ZC1
from each node to configure the hostname of zones.

After that the zones will come into running state.

Step 7 : Current status is online and running which is as expected 🙂

SolNode1:> clzc status

=== Zone Clusters ===

— Zone Cluster Status —

Name Node Name Zone Host Name Status Zone Status
—- ——— ————– —— ———–
ZC1 SolNode1 node1zone1 Online Running
SolNode2 node2zone1 Online Running

Step 8 : To verify the configuration I went to each node of cluster and login using name of zone cluster and checked the name of non-global zone.

SolNode1:> zlogin ZC1
[Connected to zone ‘ZC1’ pts/3]
Last login: Tue Aug 5 17:49:40 on pts/2
Oracle Corporation SunOS 5.10 Generic Patch January 2005
# uname -a
SunOS node1zone1 5.10 Generic_147148-26 i86pc i386 i86pc

SolNode2:> zlogin ZC1
[Connected to zone ‘ZC1’ pts/3]
Last login: Tue Aug 5 19:36:23 on pts/3
Oracle Corporation SunOS 5.10 Generic Patch January 2005
# uname -a
SunOS node2zone1 5.10 Generic_147148-26 i86pc i386 i86pc

In next post I am adding resource to zone cluster.