A very strange issue we have hit recently with a report which completion time could suddenly rise two or three times and then drop to some normal half an hour.
The report is a pretty complex query with lots of PL/SQL function calls, which in its turn calls another query with another PL/SQL function and so on. There were three levels of recursive pl/sql calls: sql(top)->pl/sql->sql(1)->pl/sql->sql(2)->pl/sql->sql & pl/sql in a jumble.
As it appeared, when the report is slow, it consumes CPU only, and all V$ views (like ASH or v$sql) show as active only the query from the level "sql(2)". Also v$sesstat didn't show any interesting anomalies (except some lob and data changing activities (db block changes), which did not give any new idea at that time).
As Tanel Poder once said, if your code constantly burns CPU, you have to trace Oracle kernel (it isn't the exact quote, just my attempt to express his message).
I started with truss, just to learn that it produced almost empty output in my case. No OS calls were observed.
pstack was not available, the last resort was oradebug.
I ended up with the following command:
oradebug PDUMP interval=1 ndumps=60 CALLSTACK 1
So, what? What can I get from that highly encrypted trace files?
The first thing I observe was "plsql_run" kernel function, which appeared exactly the same number of times as pl/sql recursive calls in my report. Actually, when I tested the report and made an oradebug trace, I run the report from anonymous PL/SQL block, so the trace below will show one additional "plsql_run" call in the top (actually bottom) of the stack.
Observation of "plsql_run" gave me some confidence that I can decrypt this trace somehow. To move further I had to find well known but unavailable on MOS Note 175982.1. And, thank God, it can be googled easily. Also, it was worth to refresh my knowledge about how to use all this stuff. Here it is.
Below is edited stack trace of Oracle kernel calls. At the right of some kernel function, I placed, what I believe, corresponds to my report's sql and pl/sql calls. It was not a rocket science after all. I gathered pl/sql source codes, queries, query plans and put it all together.
Received ORADEBUG command (#64) 'DUMP CALLSTACK 1 0x0 _seq=60 ndumps=60' from process 'Unix process pid: 6815810, image:'
Executing periodic dump 60 of 60
----- Call Stack Trace -----
calling
location
--------------------
skdstdst()+40
ksedst1()+112
ksedst_tracecb()+48
ksdxdmp()+544
ksdxfdmp()+132
ksdxcb()+3432
sspuser()+116
__sighandler() oradebug dump CALLSTACK
kcbinvbrhb()+312
kcbivbr()+596
kcblibr()+76 manages Oracle's buffer cache operation
kdlrsp()+924
kdl_destroy()+928
kdlt_freetemp()+148 lob and temporary lob management
kokltfr()+448
kokltfc()+16 lob access
koltitfr()+612
kolttfr()+136 support for object Lob buffering PKGPROV.get_last_ord() temporary_quote.tmp_quote (CLOB field operations)
pfrcvfc_format_conv
pevm_CNVMSC()+28
pfrinstr_CNVMSC()+6
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 PKGPROV.get_last_ord
peidxr_run()+280
peidxexe()+104
kkxdexe()+352
kkxmpexe()+244
kgmexwi()+584
kgmexec()+1388
evapls()+736
evaopn2()+1284
evaopn2()+4492
evaopn2()+4224
evaopn2()+4224
evaopn2()+4224
evaopn2()+4224
evaopn2()+3712
qerixGetKey()+4132 Index row source |* 68 | INDEX UNIQUE SCAN | XPKTAB_OBJ
qerixStart()+1832
qerjotFetch()+1096
qerjotFetch()+392
qerghFetch()+324
qervwFetch()+148
rwsfcd()+292
qeruaFetch()+404
opifch2()+2312 row source : union-All SQL_ID 7p6225u9gc7ka | 1 | UNION-ALL
opiall0()+2196
opikpr()+460
opiodr()+908
rpidrus()+216
skgmstack()+160
rpidru()+132
rpiswu2()+520
kprball()+1044
kkxsarfch()+568
kkxsfcn()+556
pevm_icd_call_commo
pfrinstr_ICAL()+140
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 table(CALCS.GET_PAR(ci.id, 113852902)) t
peicnt()+324
kkxuexe()+468
kkxmpsexe()+400
kgmexwi()+584
kgmexec()+1388
evapls()+736
evaopn2()+1284
qerocImageIterStart object collection iterator row source | 38 | COLLECTION ITERATOR PICKLER FETCH| GETCALCVAL
qerocStart()+504
qerjotFetch()+1096
qerjotFetch()+392
qerjotFetch()+392
qernsFetch()+624
rwsfcd()+100
qeruaFetch()+404
subsr1()+288
subsr3()+364
IPRA.$evaopn3()+697
evanvl()+956
evaopn2()+1284
opifcr()+5500
qervwRowProcedure()
qerghFetch()+1372
qervwFetch()+148 qervw - view row source SQL_ID bb8mfnvw4agxp | 41 | VIEW | VM_NWVW_3
opifch2()+2312
opiall0()+2196
opikpr()+460
opiodr()+908
rpidrus()+216
skgmstack()+160
rpidru()+132
rpiswu2()+520
kprball()+1044
kkxsarfch()+568
kkxsfcn()+556
pevm_icd_call_commo
pfrinstr_ICAL()+140
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 CALCS.GET_PAR(P_ID => A.OBJ_ID, P_A_ID => 313006642)
peicnt()+324
kkxuexe()+468
kkxmpsexe()+400
kgmexwi()+584
kgmexec()+1388
evapls()+736
evaopn2()+1284
qerocImageIterStart SQL_ID 3kp00z3rvsjf0 | 1 | COLLECTION ITERATOR PICKLER FETCH| GET_PAR | 1 |
qerocStart()+504 object collection iterator row source
subsr1()+208
subsr3()+364
IPRA.$evaopn3()+697
evaopn3()+92
opifcr()+5808
qertbFetchByRowID() tab by rowid SQL_ID 3kp00z3rvsjf0 |* 2 | TABLE ACCESS BY INDEX ROWID | TAB_OBJ
opifch2()+2312
opifch()+48
opipls()+4248
opiodr()+908
rpidrus()+216
skgmstack()+160
rpidru()+132
rpiswu2()+520
rpidrv()+1060 recurcive sql
psddr0()+448
psdnal()+472 binds
pevm_BFTCHC()+284
pfrinstr_BFTCHC()+1
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 top level anonymous block
peicnt()+324
kkxexe()+484
opiexe()+17356
kpoal8()+1872
opiodr()+908
ttcpip()+1028
opitsk()+1612
opiino()+940
opiodr()+908
opidrv()+1132
sou2o()+136
opimai_real()+560
ssthrdmain()+276
main()+204
__start()+112
The report is a pretty complex query with lots of PL/SQL function calls, which in its turn calls another query with another PL/SQL function and so on. There were three levels of recursive pl/sql calls: sql(top)->pl/sql->sql(1)->pl/sql->sql(2)->pl/sql->sql & pl/sql in a jumble.
As it appeared, when the report is slow, it consumes CPU only, and all V$ views (like ASH or v$sql) show as active only the query from the level "sql(2)". Also v$sesstat didn't show any interesting anomalies (except some lob and data changing activities (db block changes), which did not give any new idea at that time).
As Tanel Poder once said, if your code constantly burns CPU, you have to trace Oracle kernel (it isn't the exact quote, just my attempt to express his message).
I started with truss, just to learn that it produced almost empty output in my case. No OS calls were observed.
pstack was not available, the last resort was oradebug.
I ended up with the following command:
oradebug PDUMP interval=1 ndumps=60 CALLSTACK 1
So, what? What can I get from that highly encrypted trace files?
The first thing I observe was "plsql_run" kernel function, which appeared exactly the same number of times as pl/sql recursive calls in my report. Actually, when I tested the report and made an oradebug trace, I run the report from anonymous PL/SQL block, so the trace below will show one additional "plsql_run" call in the top (actually bottom) of the stack.
Observation of "plsql_run" gave me some confidence that I can decrypt this trace somehow. To move further I had to find well known but unavailable on MOS Note 175982.1. And, thank God, it can be googled easily. Also, it was worth to refresh my knowledge about how to use all this stuff. Here it is.
Below is edited stack trace of Oracle kernel calls. At the right of some kernel function, I placed, what I believe, corresponds to my report's sql and pl/sql calls. It was not a rocket science after all. I gathered pl/sql source codes, queries, query plans and put it all together.
Received ORADEBUG command (#64) 'DUMP CALLSTACK 1 0x0 _seq=60 ndumps=60' from process 'Unix process pid: 6815810, image:
Executing periodic dump 60 of 60
----- Call Stack Trace -----
calling
location
--------------------
skdstdst()+40
ksedst1()+112
ksedst_tracecb()+48
ksdxdmp()+544
ksdxfdmp()+132
ksdxcb()+3432
sspuser()+116
__sighandler() oradebug dump CALLSTACK
kcbinvbrhb()+312
kcbivbr()+596
kcblibr()+76 manages Oracle's buffer cache operation
kdlrsp()+924
kdl_destroy()+928
kdlt_freetemp()+148 lob and temporary lob management
kokltfr()+448
kokltfc()+16 lob access
koltitfr()+612
kolttfr()+136 support for object Lob buffering PKGPROV.get_last_ord() temporary_quote.tmp_quote (CLOB field operations)
pfrcvfc_format_conv
pevm_CNVMSC()+28
pfrinstr_CNVMSC()+6
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 PKGPROV.get_last_ord
peidxr_run()+280
peidxexe()+104
kkxdexe()+352
kkxmpexe()+244
kgmexwi()+584
kgmexec()+1388
evapls()+736
evaopn2()+1284
evaopn2()+4492
evaopn2()+4224
evaopn2()+4224
evaopn2()+4224
evaopn2()+4224
evaopn2()+3712
qerixGetKey()+4132 Index row source |* 68 | INDEX UNIQUE SCAN | XPKTAB_OBJ
qerixStart()+1832
qerjotFetch()+1096
qerjotFetch()+392
qerghFetch()+324
qervwFetch()+148
rwsfcd()+292
qeruaFetch()+404
opifch2()+2312 row source : union-All SQL_ID 7p6225u9gc7ka | 1 | UNION-ALL
opiall0()+2196
opikpr()+460
opiodr()+908
rpidrus()+216
skgmstack()+160
rpidru()+132
rpiswu2()+520
kprball()+1044
kkxsarfch()+568
kkxsfcn()+556
pevm_icd_call_commo
pfrinstr_ICAL()+140
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 table(CALCS.GET_PAR(ci.id, 113852902)) t
peicnt()+324
kkxuexe()+468
kkxmpsexe()+400
kgmexwi()+584
kgmexec()+1388
evapls()+736
evaopn2()+1284
qerocImageIterStart object collection iterator row source | 38 | COLLECTION ITERATOR PICKLER FETCH| GETCALCVAL
qerocStart()+504
qerjotFetch()+1096
qerjotFetch()+392
qerjotFetch()+392
qernsFetch()+624
rwsfcd()+100
qeruaFetch()+404
subsr1()+288
subsr3()+364
IPRA.$evaopn3()+697
evanvl()+956
evaopn2()+1284
opifcr()+5500
qervwRowProcedure()
qerghFetch()+1372
qervwFetch()+148 qervw - view row source SQL_ID bb8mfnvw4agxp | 41 | VIEW | VM_NWVW_3
opifch2()+2312
opiall0()+2196
opikpr()+460
opiodr()+908
rpidrus()+216
skgmstack()+160
rpidru()+132
rpiswu2()+520
kprball()+1044
kkxsarfch()+568
kkxsfcn()+556
pevm_icd_call_commo
pfrinstr_ICAL()+140
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 CALCS.GET_PAR(P_ID => A.OBJ_ID, P_A_ID => 313006642)
peicnt()+324
kkxuexe()+468
kkxmpsexe()+400
kgmexwi()+584
kgmexec()+1388
evapls()+736
evaopn2()+1284
qerocImageIterStart SQL_ID 3kp00z3rvsjf0 | 1 | COLLECTION ITERATOR PICKLER FETCH| GET_PAR | 1 |
qerocStart()+504 object collection iterator row source
subsr1()+208
subsr3()+364
IPRA.$evaopn3()+697
evaopn3()+92
opifcr()+5808
qertbFetchByRowID() tab by rowid SQL_ID 3kp00z3rvsjf0 |* 2 | TABLE ACCESS BY INDEX ROWID | TAB_OBJ
opifch2()+2312
opifch()+48
opipls()+4248
opiodr()+908
rpidrus()+216
skgmstack()+160
rpidru()+132
rpiswu2()+520
rpidrv()+1060 recurcive sql
psddr0()+448
psdnal()+472 binds
pevm_BFTCHC()+284
pfrinstr_BFTCHC()+1
pfrrun_no_tool()+96
pfrrun()+1032
plsql_run()+616 top level anonymous block
peicnt()+324
kkxexe()+484
opiexe()+17356
kpoal8()+1872
opiodr()+908
ttcpip()+1028
opitsk()+1612
opiino()+940
opiodr()+908
opidrv()+1132
sou2o()+136
opimai_real()+560
ssthrdmain()+276
main()+204
__start()+112
So, my report was constantly doing some LOB manipulation! Very soon I found some CLOB column which was fetched inside PKGPROV.get_last_ord(). And that CLOB was parsed using instr() and substr() functions. It was pretty good example of the "SQL Semantics" usage. It works, but how much does it cost? You can read more about this in the famous "Oracle PL/SQL Programming, Sixth Edition by Steven Feuerstein with Bill Pribyl" book, paragraph "SQL semantics may yield temporary LOBs"
Taking this, a quick workaround appeared in my mind. In my case CLOB column temporary_quote.tmp_quote contained not so long pieces of text, 6K at maximum. It was possible to declare a local variable with VARCHAR2(32768), assign CLOB column content to it, and just do the same parsing but with local VARCHAR2 variable.
See below Tomas Kyte's RUNSTATS utility style comparison for CLOB manipulation part of the report. Run1 is the original approach, Run2 is using a varchar2 local variable. The difference is huge!
Statistic Run1 Run2
---------------------------------- --------------- ----------------
Elapsed (santisec)...............: 124349 743
CPU (santisec)...................: 69551 404
One more thing. If you have noted that CPU time is much smaller than elapsed time, it is not due to some IO waiting time or similar. All these experiments were conducted on AIX Power 7 server. So, this article gives some explanation.