简体   繁体   中英

ZFS pool disk failure but disk looks fine

I have a ZFS pool with 4 disks on Ubuntu (zfsonlinx). All the disks are connected via SATA cables on a secondary controller that I bought. It now hosts Plex media server with movies and photos collection.

I found yesterday that one of the disk failed and my pool is now "degraded".

kiran@ub1:~$ zpool status pool: tank state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: scrub repaired 29.2M in 0 days 14:13:19 with 0 errors on Sun Jul 11 14:37:20 2021 config:

    NAME                        STATE     READ WRITE CKSUM
    tank                        DEGRADED     0     0     0
      mirror-0                  DEGRADED     0     0     0
        wwn-0x5000c500c8022883  DEGRADED     4     0     0  too many errors
        wwn-0x50014ee2bbdb0dec  ONLINE       0     0     0
      mirror-1                  ONLINE       0     0     0
        wwn-0x50014ee26687691e  ONLINE       0     0     0
        wwn-0x50014ee21227fa29  ONLINE       0     0     0

The disk is less than 1 year old and is a WD Red NAS HDD.

I ran scrub command and found that ZFS scrubbed the data but did not find anything to resilver.

Here are my questions:

  1. does this mean the disk is okay now and I can just clear the errors or just because there were errors, I should replace disks?
  2. Are there any commands to check integrity of the disk using ZFS partition?

Please note that I am just a beginner in Linux and ZFS and implemented my ZFS pool by reading help online.

Can someone please help me figure out what should be my next steps?

Kiran

  1. does this mean the disk is okay now and I can just clear the errors or just because there were errors, I should replace disks?

Disk read errors are a sign of possible impending hardware problems, but since you have the data mirrored to another drive it should be ok to clear the errors and wait for the drive to fail more completely before replacing it.

Let's call the disk with errors "A" and the mirror without errors "B" -- the only downside of waiting to replace A is that if B fails first, and A has been quietly corrupting data for a little while, then you will end up losing that data when B dies. So I would recommend that you regularly check the disk for signs of corruption (using the command below) to make sure that doesn't happen, and replace it if you start seeing worse error rates.

  1. Are there any commands to check integrity of the disk using ZFS partition?

Yes, zpool scrub reads through all data on the disk and validates it against checksums in the block pointers. It will report any errors in the zpool status screen that you posted. Generally speaking, it's a good idea to run this periodically (every few weeks is a reasonable cadence).

does this mean the disk is okay now and I can just clear the errors or just because there were errors, I should replace disks?

ZFS errors are real issues , but with the provided information, we cannot be sure that the issue came from the disk.

Are there any commands to check integrity of the disk using ZFS partition?

Yes.

  • smartctl to diagnose the disks. IMHO it will probably find hardware errors in your situation
  • A full zpool status -v could give more information (data corruption, persistent errors after scrub , …)
  • a tool capable to diagnose the «_secondary controller that I bought_» or connection/transport errors

By the way, notice that ZFS is not only a filesystem and anything connected within the disks and the main-board may interfere in the good volume management. So if you cannot connects disks without it, be sure that it interfere not with ZFS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM