基于RMAN实现坏块介质恢复(blockrecover)

http://blog.csdn.net/leshami/article/details/10500997

对于物理损坏的数据块,我们可以通过RMAN块介质恢复(BLOCK MEDIA RECOVERY)功能来完成受损块的恢复,而不需要恢复整个数据库或所有文件来修复这些少量受损的数据块。恢复整个数据库或数据文件那不是大炮用来打蚊子,有点不值得!但前提条件是你得有一个可用的RMAN备份存在,因此,无论何时备份就是一切。本文演示了产生坏块即使用RMAN实现坏块恢复的全过程。

1、创建用于演示的data file  
SQL> create tablespace tbs datafile '/u01/app/oracle/oradata/DBdb/tbs.dbf' size 10m autoextend on;

Tablespace created.

--基于新的数据文件创建对象tb
SQL> create table tb tablespace tbs as select * from dba_objects;  

Table created.

SQL> col file_name format a60  
SQL> select file_id,file_name from dba_data_files where tablespace_name='TBS';

   FILE_ID FILE_NAME
---------- ------------------------------------------------------------
         7 /u01/app/oracle/oradata/DBdb/tbs.dbf

SQL> COL SEGMENT_NAME FOR A15
SQL> select segment_name , header_file , header_block,blocks from dba_segments where segment_name = 'TB' and owner='SYS';

SEGMENT_NAME    HEADER_FILE HEADER_BLOCK     BLOCKS
--------------- ----------- ------------ ----------
TB                        7          130       1280

 
--首先使用rman备份对应的数据文件  
[oracle@wang ~]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Mon Nov 27 22:39:49 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: DBDB (DBID=3282897732)

RMAN> backup datafile 7;

Starting backup at 27-NOV-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=44 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/u01/app/oracle/oradata/DBdb/tbs.dbf
channel ORA_DISK_1: starting piece 1 at 27-NOV-17
channel ORA_DISK_1: finished piece 1 at 27-NOV-17
piece handle=/u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp tag=TAG20171127T224014 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 27-NOV-17

RMAN>


2、单块数据块损坏的恢复处理
--下面使用了linux自带的dd命令来损坏单块数据块  
[oracle@wang ~]$ dd of=/u01/app/oracle/oradata/DBdb/tbs.dbf bs=8192 conv=notrunc seek=130 <> Corrupted block!
> EOF
0+1 records in
0+1 records out
17 bytes (17 B) copied, 0.000124852 s, 136 kB/s
[oracle@wang ~]$
 
--清空buffer cache
SQL> alter system flush buffer_cache;

System altered.

--查询表tb,收到ORA-01578
SQL> select count(*) from tb;
select count(*) from tb
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 130)
ORA-01110: data file 7: '/u01/app/oracle/oradata/DBdb/tbs.dbf'

--查询视图v$database_block_corruption,提示有坏块,注意该视图可能不会返回任何数据,如无返回,先执行backup validate
V$DATABASE_BLOCK_CORRUPTION:
V$DATABASE_BLOCK_CORRUPTION displays information about database blocks that were corrupted after the last backup.

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         7        130          1                  0 CORRUPT

--下面使用blockrecover来恢复坏块
RMAN>  blockrecover datafile 7 block 130;

Starting recover at 27-NOV-17
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=50 device type=DISK

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00007
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp tag=TAG20171127T224014
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 27-NOV-17

RMAN>

--再次查询表tb正常
SQL> show user;
USER is "SYS"
SQL>
SQL>  select count(*) from tb;

  COUNT(*)
----------
     87046


3、多块数据块损坏的恢复处理
[oracle@wang ~]$ dd of=/u01/app/oracle/oradata/DBdb/tbs.dbf bs=8192 conv=notrunc seek=133 <
> New corrupted block!
> EOF
0+1 records in
0+1 records out
21 bytes (21 B) copied, 4.6398e-05 s, 453 kB/s
[oracle@wang ~]$
[oracle@wang ~]$ dd of=/u01/app/oracle/oradata/DBdb/tbs.dbf bs=8192 conv=notrunc seek=143 <
> New01 corrupted block!
> EOF
0+1 records in
0+1 records out
23 bytes (23 B) copied, 6.3948e-05 s, 360 kB/s
[oracle@wang ~]$
[oracle@wang ~]$ dd of=/u01/app/oracle/oradata/DBdb/tbs.dbf bs=8192 conv=notrunc seek=153 <
> New02 corrupted block!
> EOF
0+1 records in
0+1 records out
23 bytes (23 B) copied, 6.5705e-05 s, 350 kB/s
[oracle@wang ~]$


--刷新共享池
SQL> alter system flush buffer_cache;  

System altered.


--下面提示块133被损坏,注意我们损坏了多块数据块,但查询时,从块号最小的开始提示,如133被修复后还有坏块则继续提示133之后的坏块 
SQL> select count(*) from tb;
select count(*) from tb
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 133)
ORA-01110: data file 7: '/u01/app/oracle/oradata/DBdb/tbs.dbf'

--查询视图v$database_block_corruption无任何记录  
SQL> select * from v$database_block_corruption;  

no rows selected  

--下面使用backup validate来校验数据文件  
RMAN> backup validate datafile 7;

Starting backup at 27-NOV-17
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/u01/app/oracle/oradata/DBdb/tbs.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    FAILED 0              138          1536            3821836   
  File Name: /u01/app/oracle/oradata/DBdb/tbs.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1240            
  Index      0              0               
  Other      3              158                           --有3个Blocks Failing          

validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/dbdb/DBdb/trace/DBdb_ora_17497.trc for details
Finished backup at 27-NOV-17

RMAN>


--再次查询v$database_block_corruption,表明有3个损坏的块
SQL> select * from v$database_block_corruption;  

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         7        133          1                  0 CORRUPT
         7        143          1                  0 CORRUPT
         7        153          1                  0 CORRUPT

--下面直接使用blockrecover corruption list来恢复,如下所有刚刚被校验的坏块都会被恢复 :
blockrecover corruption list;

或者如下命令

run{
   allocate channel ch2 device type disk;
   blockrecover datafile 7 block 133;
   blockrecover datafile 7 block 143;
   blockrecover datafile 7 block 153;
   release channel ch2;}

执行如下:
RMAN> run{
2>    allocate channel ch2 device type disk;
3>    blockrecover datafile 7 block 133;
4>    blockrecover datafile 7 block 143;
5>    blockrecover datafile 7 block 153;
6>    release channel ch2;}

released channel: ORA_DISK_1
allocated channel: ch2
channel ch2: SID=50 device type=DISK

Starting recover at 27-NOV-17

channel ch2: restoring block(s)
channel ch2: specifying block(s) to restore from backup set
restoring blocks of datafile 00007
channel ch2: reading from backup piece /u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp
channel ch2: piece handle=/u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp tag=TAG20171127T224014
channel ch2: restored block(s) from backup piece 1
channel ch2: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 27-NOV-17

Starting recover at 27-NOV-17

channel ch2: restoring block(s)
channel ch2: specifying block(s) to restore from backup set
restoring blocks of datafile 00007
channel ch2: reading from backup piece /u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp
channel ch2: piece handle=/u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp tag=TAG20171127T224014
channel ch2: restored block(s) from backup piece 1
channel ch2: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 27-NOV-17

Starting recover at 27-NOV-17

channel ch2: restoring block(s)
channel ch2: specifying block(s) to restore from backup set
restoring blocks of datafile 00007
channel ch2: reading from backup piece /u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp
channel ch2: piece handle=/u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_TAG20171127T224014_f1r8ygdx_.bkp tag=TAG20171127T224014
channel ch2: restored block(s) from backup piece 1
channel ch2: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 27-NOV-17

released channel: ch2


--验证,检查:
SQL> select * from v$database_block_corruption;

no rows selected.

SQL>
SQL> select count(*) from tb;

  COUNT(*)
----------
     87046


4、坏块的对象定位与影响
SQL> col object_name for a25
SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id,dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id from tb where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2;

 OBJECT_ID    FILE_ID   BLOCK_ID OWNER                          OBJECT_NAME                OBJECT_ID
---------- ---------- ---------- ------------------------------ ------------------------- ----------
     89910         7        163PUBLIC                         GV$BACKUP_SET                   2364
     89910         7        163SYS                            GV_$BACKUP_PIECE                2365

--使用上面的方法,损块块163,173:
[oracle@wang ~]$ dd of=/u01/app/oracle/oradata/DBdb/tbs.dbf bs=8192 conv=notrunc seek=163 <
> New03 corrupted block!
> EOF
0+1 records in
0+1 records out
23 bytes (23 B) copied, 3.9521e-05 s, 582 kB/s
[oracle@wang ~]$
[oracle@wang ~]$ dd of=/u01/app/oracle/oradata/DBdb/tbs.dbf bs=8192 conv=notrunc seek=173 <
> New04 corrupted block!
> EOF
0+1 records in
0+1 records out
23 bytes (23 B) copied, 6.0101e-05 s, 383 kB/s
[oracle@wang ~]$

a、对于坏块对象无法进行聚合汇总等操作:
SQL> select count(*) from tb;
select count(*) from tb
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 163)
ORA-01110: data file 7: '/u01/app/oracle/oradata/DBdb/tbs.dbf'

b、对于坏块上的记录无法被查询  
--我们使用基于之前查询到的OBJECT_ID来查询
SQL> select owner,object_name,object_id from tb  where object_id in(2364,2365);
select owner,object_name,object_id from tb  where object_id in(2364,2365)
                                        *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 163)
ORA-01110: data file 7: '/u01/app/oracle/oradata/DBdb/tbs.dbf'

--如下面的查询,位于损坏块上(163的数据无法被查询到,但对于未损坏的依旧可以查询。下面的查询时块161上的对象 
SQL> select owner,object_name,object_id from tb where dbms_rowid.rowid_block_number(rowid)=161 and rownum<3;

OWNER                          OBJECT_NAME                OBJECT_ID
------------------------------ ------------------------- ----------
SYS                            GV_$LATCHNAME                   2203
PUBLIC                         GV$LATCHNAME                    2204


c、定位受损块所对应的对象  
select tablespace_name,segment_type,owner,segment_name,partition_name FROMdba_extentsWHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks -1;

--查询如下:
SQL> SELECT tablespace_name,segment_type,owner,segment_name,partition_name FROM dba_extents WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks -1;
Enter value for file_id: 7
Enter value for block_id: 163
old   1: SELECT tablespace_name,segment_type,owner,segment_name,partition_name FROM dba_extents WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks -1
new   1: SELECT tablespace_name,segment_type,owner,segment_name,partition_name FROM dba_extents WHERE file_id = 7 AND 163 BETWEEN block_id AND block_id + blocks -1

TABLESPACE_NAME                SEGMENT_TYPE       OWNER                          SEGMENT_NAME    PARTITION_NAME
------------------------------ ------------------ ------------------------------ --------------- ------------------------------
TBS                            TABLE              SYS                            TB


d、对于损坏的数据文件,缺省情况下,不能对其进行备份,如下 
RMAN> backup datafile 7 tag='corruption';

Starting backup at 27-NOV-17
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=44 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/u01/app/oracle/oradata/DBdb/tbs.dbf
channel ORA_DISK_1: starting piece 1 at 27-NOV-17
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 11/27/2017 23:23:49
ORA-19566: exceeded limit of 0 corrupt blocks for file /u01/app/oracle/oradata/DBdb/tbs.dbf

RMAN>

--查询对应数据文件坏块数:
SQL> select * from v$database_block_corruption;
     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         7        163          1                  0 CORRUPT
         7        173          1                  0 CORRUPT

--需要设定允许损坏块的数量之后才能进行备份  
run{  
   set maxcorrupt for datafile 7 to 2;
   backup datafile 7 tag='corruption';
   }
   
执行如下:
RMAN> run{  
2>    set maxcorrupt for datafile 7 to 2;
3>    backup datafile 7 tag='corruption';
4>    }

executing command: SET MAX CORRUPT

Starting backup at 27-NOV-17
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/u01/app/oracle/oradata/DBdb/tbs.dbf
channel ORA_DISK_1: starting piece 1 at 27-NOV-17
channel ORA_DISK_1: finished piece 1 at 27-NOV-17
piece handle=/u01/app/oracle/fast_recovery_area/DBDB/backupset/2017_11_27/o1_mf_nnndf_CORRUPTION_f1rcshjk_.bkp tag=CORRUPTION comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 27-NOV-17

RMAN>


--查看备份信息如下,应在修复坏块后重新备份以避免由于保留策略导致先前可用的备份被aged out  
RMAN> list backup summary;  


List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1       B  F  A DISK        27-NOV-17       1       1       NO         TAG20171127T224014
2       B  F  A DISK        27-NOV-17       1       1       NO         CORRUPTION

RMAN>

5、后记
a、对于受损的数据块,仅仅坏块上的数据无法被查询或读取,其余正常块的数据依旧可以使用。
b、对于受损的表对象进行聚合等相关运算时收到错误提示,因为坏块上的数据无法被统计。如果你聚合的是索引列,索引未损坏的情形则可正常返回。
c、可以基于RMAN可用的备份文件实现块介质恢复,其数据文件无需offline,开销最小,影响最小。
d、对于多个数据块的损坏,先执行backup validate校验数据库或相应的数据文件以便标记受损的坏块后,记录到视图v$database_block_corruption,然后后续恢复。
e、对于使用backup validate 校验后的情形,坏块恢复时可以直接使用blockrecover corruption list一次性恢复所有的坏块。
f、缺省情况下,存在坏块的数据文件无法成功备份,也会导致自动备份脚本失败。



网页名称:基于RMAN实现坏块介质恢复(blockrecover)
链接地址:http://hbruida.cn/article/jogdcg.html