客户有一套数据库,这周有例行停机维护的时间,于是我们趁这次停机例行维护的时间区间进行parallel_execution_message_size参数的修改,修改完成后在重启的过程中遇到了ora-00600[kgeade_is_0]的错误。首先来说一下为什么要修改parallel_execution_message_s
客户有一套数据库,这周有例行停机维护的时间,于是我们趁这次停机例行维护的时间区间进行parallel_execution_message_size参数的修改,修改完成后在重启的过程中遇到了ora-00600[kgeade_is_0]的错误。首先来说一下为什么要修改parallel_execution_message_size这个参数,根据oracle最佳实践的推荐,10g默认装完数据库该参数的值是2152,也有可能是2048,推荐将这个值设置成8192,而在11g中,这个值默认被设置成了16k,是可以满足大多数应用场景的。这个值的作用就是在并行执行中消息的大小。这个值越大,需要的shared pool也就越大。虽然能获得更好的性能,但是相应的内存也需要的更多了。还有:这个参数在并行恢复或者是standby recover情况下,增加它的大小到4096以上,也能提升至少20%恢复速度。
我们来看一下我们的报错的情况,我们修改一个节点该参数,然后直接重启。
sun jul 13 16:57:58 cst 2014errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_m000_21519.trc:ora-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []sun jul 13 16:57:59 cst 2014errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mmon_21339.trc:ora-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []sun jul 13 16:58:00 cst 2014errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_mmon_21339.trc:ora-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []sun jul 13 16:58:00 cst 2014trace dumping is performing id=[cdmp_20140713165800]sun jul 13 16:58:01 cst 2014trace dumping is performing id=[cdmp_20140713165801]sun jul 13 16:58:07 cst 2014errors in file /oracle/app/oracle/admin/racdb/bdump/racdb1_m000_21519.trc:ora-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []sun jul 13 16:58:07 cst 2014trace dumping is performing id=[cdmp_20140713165807]*** 2014-07-13 16:57:58.781ksedmp: internal or fatal errorora-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []current sql statement for this session:select tablespace_id, rfno, allocated_space, file_size, file_maxsize, changescn_base, changescn_wrap, flag from gv$filespace_usage where inst_id != :inst and (changescn_wrap >= :w or (changescn_wrap = :w and changescn_base >= :b))*** 2014-07-13 16:57:59.274ksedmp: internal or fatal errorora-00600: internal error code, arguments: [kgeade_is_0], [], [], [], [], [], [], []current sql statement for this session:select instance_name, host_name, nvl(gvi_startup_time, systimestamp) - interval '1' second as shutdown_time from (select rri.instance_name as instance_name, rri.host_name as host_name, from_tz(rri.startup_time, '+00:00') as rri_startup_time, dbms_ha_alerts_prvt.instance_startup_timestamp_tz(gvi.startup_time) as gvi_startup_time from recent_resource_incarnations$ rri left outer join gv$instance gvi on gvi.instance_name = rri.resource_name where rri.resource_type = 'instance' and :b2 = rri.db_unique_name and :b1 = rri.db_domain) where gvi_startup_time is null or gvi_startup_time > rri_startup_time group by instance_name, host_name, gvi_startup_time----- pl/sql call stack ----- object line object handle number name0x7de705a8 301 package body sys.dbms_ha_alerts_prvt0x7de64740 1 anonymous block
可以看到,都是在查询gv$视图的语句出现了这个错误。我们在来看一下它出错时候的堆栈信息。
ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7fff778810b0 ? 7fff77881110 ? 7fff77881050 ? 000000000 ?ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7fff778810b0 ? 7fff77881110 ? 7fff77881050 ? 000000000 ?ksfdmp()+63 call ksedmp() 000000003 ? 000000001 ? 7fff778810b0 ? 7fff77881110 ? 7fff77881050 ? 000000000 ?kgerinv()+161 call ksfdmp() 006ae9a20 ? 000000003 ? 7fff778810b0 ? 7fff77881110 ? 7fff77881050 ? 000000000 ?kgeasnmierr()+163 call kgerinv() 006ae9a20 ? 2b763e0b0040 ? 7fff77881110 ? 7fff77881050 ? 000000000 ? 000000000 ?kgeade()+501 call kgeasnmierr() 006ae9a20 ? 2b763e0b0040 ? 7fff77881110 ? 7fff77881050 ? 000000000 ? 000000000 ?kgerev()+58 call kgeade() 2b763e0b0040 ? 006ae9a20 ? 2b763e0b0040 ? 000000000 ? 000000000 ? 000000000 ?kserec0()+186 call kgerev() 006ae9a20 ? 2b763e0b0040 ? 000000000 ? 000000000 ? 7fff778821a0 ? 000000000 ?kxfpg1sg()+2014 call kserec0() 006ae9a20 ? 000000001 ? 000000029 ? 7fff77881f40 ? 000000000 ? 388b519840 ?kxfpgsg()+2098 call kxfpg1sg() 08364d278 ? 000000001 ? 7fff778822b0 ? 7fff77881f40 ? 08364cc48 ? 2b7600000001 ?kxfrallocslaves()+3 call kxfpgsg() 000000005 ? 000000001 ?51 000000001 ? 000000001 ? 3e0a254800000001 ? 2b763e0a2548 ?kxfrialo()+2111 call kxfrallocslaves() 00005322e ? 2b763e5726c0 ? 000000001 ? 7fff00000001 ? 7fff00000001 ? 000000001 ?kxfralo()+313 call kxfrialo() 00005322e ? 2b763e5726c0 ? 000000001 ? 07daa7230 ? 2b763e572768 ? 7fff77880000 ?qerpx_rowsrc_start( call kxfralo() 00005322e ? 2b763e5726c0 ?)+3892 000000001 ? 07daa7230 ? 2b763e572768 ? 000000000 ?qerpxstart()+234 call qerpx_rowsrc_start( 7fff77883280 ? 000000001 ? ) 000000001 ? 07daa8910 ? 100000001 ? 000000000 ?selexe()+667 call qerpxstart() 000000001 ? 000003f60 ? 000000001 ? 07daa8910 ? 100000001 ? 000000000 ?opiexe()+4687 call selexe() 07dacbb38 ? 7fff77883f60 ? 7fff77883f60 ? 07dacbb38 ? 100000001 ? 000000000 ?kpoal8()+2295 call opiexe() 000000049 ? 000000003 ? 7fff77884428 ? 000000003 ? 100000001 ? 000000000 ?opiodr()+1184 call kpoal8() 00000005e ? 000000000 ? 7fff77887ef8 ? 000000003 ? 83b7000000000001 ? 000000000 ?kpoodrc()+38 call opiodr() 00000005e ? 000000000 ? 7fff77887ef8 ? 000000000 ? 005bebdf0 ? 000000000 ?rpiswu2()+409 call kpoodrc() 7fff77885440 ? 000000000 ? 7fff77887ef8 ? 000000000 ? 005bebdf0 ? 000000000 ?kpoodr()+554 call rpiswu2() 083b7abf0 ? 000000000 ? 2b763e0f0cbc ? 000000002 ? 2b763e0f0cfc ? 000000000 ?upirtrc()+2101 call kpoodr() 2b763e342e20 ? 00000005e ? 7fff77887ef8 ? 000000000 ? 2b763e0f0cfc ? 000000000 ?kpurcsc()+125 call upirtrc() 2b763e342e20 ? 00000005e ? 7fff77887ef8 ? 7fff77888060 ? 7fff77888fd0 ? 003c558c6 ?kpuexecv8()+1705 call kpurcsc() 7fff778897d0 ? 00000005e ? 7fff77887ef8 ? 7fff77888060 ? 7fff77888fd0 ? 003c558c6 ?kpuexec()+2643 call kpuexecv8() 2b763e0fe958 ? 2b763e33f4c0 ? 2b763e33f540 ? 000000000 ? 000000000 ? 7fff7788a8c4 ?ocistmtexecute()+41 call kpuexec() 000000001 ? 2b763e33f4c0 ? 2b763e342db0 ? 000000001 ? 000000000 ? 000000000 ?ktte_aggregate_finf call ocistmtexecute() 000000001 ? 2b763e33f4c0 ?o()+3133 2b763e342db0 ? 000000001 ? 000000000 ? 000000000 ?ktte_monitor_tsth() call ktte_aggregate_finf 7fff7788b780 ? 000000001 ?+788 o() 000000009 ? 000000001 ? 000000000 ? 000000000 ?ktte_threshold_slav call ktte_monitor_tsth() 7fff7788b780 ? 000000001 ?e()+183 000000009 ? 000000001 ? 000000000 ? 000000000 ?kebm_slave_main()+2 call ktte_threshold_slav 07f63b200 ? 000000001 ?21 e() 000000000 ? 000000001 ? 000000000 ? 000000000 ?ksvrdp()+1159 call kebm_slave_main() 07f63b200 ? 07f63b200 ? 000000000 ? 000000001 ? 000000000 ? 000000000 ?opirip()+748 call ksvrdp() 07f63b200 ? 07f63b200 ? 000000000 ? 000000001 ? 000000000 ? 000000000 ?opidrv()+583 call opirip() 000000032 ? 000000004 ? 7fff7788d298 ? 000000001 ? 000000000 ? 000000000 ?sou2o()+114 call opidrv() 000000032 ? 000000004 ? 7fff7788d298 ? 000000001 ? 000000000 ? 000000000 ?opimai_real()+317 call sou2o() 7fff7788d270 ? 000000032 ? 000000004 ? 7fff7788d298 ? 000000000 ? 000000000 ?main()+116 call opimai_real() 000000003 ? 7fff7788d300 ? 000000004 ? 7fff7788d298 ? 000000000 ? 000000000 ?__libc_start_main() call main() 000000003 ? 7fff7788d300 ?+244 000000004 ? 7fff7788d298 ? 000000000 ? 000000000 ?_start()+41 call __libc_start_main() 00072d108 ? 000000001 ? 7fff7788d458 ? 000000000 ? 000000000 ? 000000003 ?
根据文档ora-600 [kgeade_is_0] in a real application cluster (rac) environment (文档 id 797182.1)里面的描述,凡是trace文件堆栈信息类似于“kxfpg1sg kxfpgsg kxfrallocslaves kxfrialo kxfralo qerpx_rowsrc_start”这样的,命中bug8592375。解决这个问题的办法也很简单,就是把两个库实例都停下来,修改成相同的参数,然后启动。像我们这样一个实例还在运行着,使用的是以前的参数,而新实例启动之后用的新的参数,就会导致这个问题的出现。还一个办法是安装补丁程序,但是感觉这个补丁是针对standby数据库的。8592375: phsb: readable standby reported ora-00700:[kgeade_is_0]。
参考文档:ora-600 [kgeade_is_0] in a real application cluster (rac) environment (文档 id 797182.1)
原文地址:修改并行参数引发ora-600 [kgeade_is_0]的问题处理, 感谢原作者分享。
