Ransomeware Attack

I got hit by a ransomware attack, that destroyed my server. Thankfully I have keep really good backups and was able to just wipe the server and restore my backups. Some configuration information was lost, so it’s taking me a big to get all the bugs worked out. The good news is I didn’t lose any data. Its just a pain in the ass to find the time to re-configure everything.

Aaron

RMAN table recovery error

I had a user delete the data in a table that they didn’t mean to. It took them a week to notify us, so no chance to use flashback to get the data back. I was not looking forward to restoring 100TB of datafiles and recovery logs just to restart a couple GBs of data. Thankfully the database is running 19c, and rman can now do a table recovery.

I tried to recover the table

run {
  set auxiliary instance parameter file to '/opt/tempam/pfile.ora';
  recover table amyers.mytable until time 'trunc(sysdate-12)'
  auxiliary destination '/opt/tempam'
  remap table 'amyers'.'mytable':'mytable_recovered';
}

I had 20TB of space provisioned to /opt/tempam. To do table recovery, the system, sysaux, undo and the source table tablespaces are restored. in my case this should come to about 3TB.

After waiting an hour for everything to restore and roll forward, I get the following error in the final step where it should be exporting the table with datapump

RMAN-06960: EXPDP> ORA-31626: job does not exist
ORA-31637: cannot create job TSPITR_EXP_enpf_BDAD for user SYS
ORA-39062: error creating master process DM00
ORA-31613: Master process DM00 failed during startup.
RMAN-05122: There is insufficient disk space 951901 MB to perform table recovery of 1921807 MB.
RMAN-05122: There is insufficient disk space 951901 MB to perform table recovery of 1058623 MB.

This is a lie, there was plenty of space.

Turns out this is a known bug, Bug 9109785 : ORA-1555 TRANSPORT TABLESPACE FAILS WITH ORA-01555 DURING THE EXPDP PHASE. 

The work around is to prevent the database from being dropped after the failure by adding the “keep auxiliary” statement to the restore, and manually performing the datapump export.

run{  
  set auxiliary instance parameter file to '/opt/tempam/pfile.ora';
  recover table amyers.mytable until time 'trunc(sysdate-12)'
  auxiliary destination '/opt/tempam' keep auxiliary
  remap table 'amyers'.'mytable':'mytable_recovered';
}

Getting and formatting index ddl

This is a little useful sql I used this morning for stripping out all the additional information that dbms_metadata.get_ddl outputs. This strips the create index statements down to their bare bones.

set linesize 32767
SET TRIMSPOOL ON
SET TRIMOUT ON
SET WRAP OFF
set pagesize 0
 
begin
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'STORAGE',false);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'TABLESPACE',false);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES', false);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'REF_CONSTRAINTS', TRUE);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'CONSTRAINTS', false);
end;
/
 
SELECT DBMS_METADATA.get_ddl ('INDEX', index_name, owner)
  FROM all_indexes
WHERE     table_owner = UPPER ('OWNER')
       AND table_name IN ('TABLE1', 'TABLE2');

wrong number or types of arguments in call to ‘CHECKDATAFILEFORSTANDBY’

I started seeing this today in one of our databases

RMAN> resync catalog;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 01/24/2020 11:45:49
RMAN-10015: error compiling PL/SQL program
RMAN-10014: PL/SQL error 0 on line 2620 column 18: Statement ignored
RMAN-10014: PL/SQL error 306 on line 2620 column 18: wrong number or types of arguments in call to 'CHECKDATAFILEFORSTANDBY'

I found metalink article 2360045.1 which states that it is a catalog version mismatch, however running upgrade catalog doesn’t resolve the problem. 

I remember hitting this error a long time ago, and although I can’f find the reason why, running upgrade catalog two times in a row seems to fix the problem (so typing upgrade catalog 4 times). 

TDE Wallet on RAC

So I have been having trouble getting the oracle wallet to auto open, and it’s always messed up in gv$encryption_wallet

select * from gv$encryption_wallet;

   INST_ID WRL_TYPE		WRL_PARAMETER STATUS   WALLET_TYPE   WALLET_OR  FULLY_BAC    CON_ID
---------- ------------ ---------------------- ------------- ---------- ------------ ------
         4 FILE 		OPEN                   AUTOLOGIN     SINGLE     NO           0
         1 FILE			OPEN                   AUTOLOGIN     SINGLE     NO           0
         3 FILE			NOT_AVAILABLE          UNKNOWN       SINGLE     UNDEFINED    0
         2 FILE			NOT_AVAILABLE          UNKNOWN       SINGLE     UNDEFINED    0

According to Oracle this is is because TNS_ADMIN is not set, so I set it and several other suggested variables.

$ srvctl setenv database -d oradb1 -T "ORACLE_UNQNAME=oradb1"
$ srvctl setenv database -d oradb1 -T "ORACLE_BASE=/u01/app/oracle"
$ srvctl setenv database -d oradb1 -T "ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/"
$ srvctl setenv database -d oradb1 -T "TNS_ADMIN=/u01/app/oracle/product/12.2.0.1/db/network/admin/"

$ srvctl getenv database -d oradb1
oradb1:
ORACLE_UNQNAME=oradb1
ORACLE_BASE=/u01/app/oracle
TNS_ADMIN=/u01/app/oracle/product/12.2.0.1/db/network/admin/
ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/

After restarting the database, the ACFS volume now shows 

select * from gv$encryption_wallet;

   INST_ID WRL_TYPE             WRL_PARAMETER                    STATUS   WALLET_TYPE          WALLET_OR FULLY_BAC     CON_ID
---------- -------------------- -------------------------------- -------- -------------------- --------- --------- ----------
         1 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0
         2 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0
         3 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0
         4 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0

DBSNMP long running query

I noticed a long running query in one of my databases this morning.

  SELECT TABLESPACE,
           ROUND ( ( (max_size_mb - used_space + free_space_mb) / max_size_mb),
                  :"SYS_B_00")
         * :"SYS_B_01"
             AS pct_free
    FROM (  SELECT ts.tablespace_name                             TABLESPACE,
                     ROUND (SUM (NVL (fs.bytes, :"SYS_B_02")))
                   / :"SYS_B_03"
                   / :"SYS_B_04"
                       free_space_mb,
                   ROUND (SUM (df.bytes)) / :"SYS_B_05" / :"SYS_B_06" used_space,
                     ROUND (
                         SUM (
                             CASE autoextensible
                                 WHEN :"SYS_B_07" THEN df.maxbytes
                                 ELSE df.bytes
                             END))
                   / :"SYS_B_08"
                   / :"SYS_B_09"
                       AS max_size_mb,
                   ts.block_size
                       AS block_size
              FROM dba_data_files df
                   LEFT OUTER JOIN (  SELECT file_id, SUM (bytes) bytes
                                        FROM dba_free_space fs
                                    GROUP BY file_id) fs
                       ON df.file_id = fs.file_id
                   INNER JOIN dba_tablespaces ts
                       ON df.tablespace_name = ts.tablespace_name
             WHERE ts.contents = :"SYS_B_10"
          GROUP BY ts.tablespace_name, ts.block_size)
ORDER BY pct_free DESC

Its computing tablespace size. The average runtime is 272.03 sec, or 4.5 minutes. This isn’t right, its running in a few seconds in other databases.

I started looking at the query in OEM, and noticed almost all the execution time was being spent on a FTS of X$KTFBUE. 

A little googling turned up the following article:

Fixed Table x$ktfbue has not statistics

My issue appears to be the same, there are not statistics on teh X$KTFBUE table.

 SYS pp1oim1> exec print_table('select * from dba_tab_statistics where  table_name=''X$KTFBUE'' ');
OOWNER                        : SYS
TABLE_NAME                    : X$KTFBUE
PARTITION_NAME                :
PARTITION_POSITION            :
SUBPARTITION_NAME             :
SUBPARTITION_POSITION         :
OBJECT_TYPE                   : FIXED TABLE
NUM_ROWS                      :
BLOCKS                        :
EMPTY_BLOCKS                  :
AVG_SPACE                     :
CHAIN_CNT                     :
AVG_ROW_LEN                   :
AVG_SPACE_FREELIST_BLOCKS     :
NUM_FREELIST_BLOCKS           :
AVG_CACHED_BLOCKS             :
AVG_CACHE_HIT_RATIO           :
IM_IMCU_COUNT                 :
IM_BLOCK_COUNT                :
IM_STAT_UPDATE_TIME           :
SCAN_RATE                     :
SAMPLE_SIZE                   :
LAST_ANALYZED                 :
GLOBAL_STATS                  :
USER_STATS                    :
STATTYPE_LOCKED               :
STALE_STATS                   :
SCOPE                         : SHARED
??????????????????

So gather stats on the table

 
 SYS pp1oim1> exec DBMS_STATS.GATHER_TABLE_STATS('SYS','X$KTFBUE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.24
 SYS pp1oim1> exec print_table('select * from dba_tab_statistics where  table_name=''X$KTFBUE'' ');
OOWNER                        : SYS
TABLE_NAME                    : X$KTFBUE
PARTITION_NAME                :
PARTITION_POSITION            :
SUBPARTITION_NAME             :
SUBPARTITION_POSITION         :
OBJECT_TYPE                   : FIXED TABLE
NUM_ROWS                      : 158077
BLOCKS                        :
EMPTY_BLOCKS                  :
AVG_SPACE                     :
CHAIN_CNT                     :
AVG_ROW_LEN                   : 66
AVG_SPACE_FREELIST_BLOCKS     :
NUM_FREELIST_BLOCKS           :
AVG_CACHED_BLOCKS             :
AVG_CACHE_HIT_RATIO           :
IM_IMCU_COUNT                 :
IM_BLOCK_COUNT                :
IM_STAT_UPDATE_TIME           :
SCAN_RATE                     :
SAMPLE_SIZE                   : 158077
LAST_ANALYZED                 : 09-sep-2019 10:32:52
GLOBAL_STATS                  : YES
USER_STATS                    : NO
STATTYPE_LOCKED               :
STALE_STATS                   :
SCOPE                         : SHARED
??????????????????

With the new statistics the query finishes in a few seconds.

Monitoring Online ASM Datafile Moves

I’m moving a ~7TB database online as the development group that uses it doesn’t want to take any down time. Thankfully the database isn’t horribly active right now so moving the data files online isn’t causing a huge problem. The only other times I have moved data files online was in an incredibly busy database and it could take half a day per data file.

I don’t want them to continually bother me so I gave them a view that they could monitor the process through. Here are the caveat, I know when the move started, that is hard coded, also my volume names are easy to tell if they are on flash storage or spinning media as they are two different asm pools, so extracting the second character of the file name yields a ‘D’ or an ‘F’ where D is spinning media and F is flash media.

create or replace view storage_migration_status
                 'D', 'Spinning Media',
                 'F', 'Flash Media')
             AS Metric,
         TO_CHAR (ROUND (SUM (bytes) / 1024 / 1024 / 1024, 0) || ' GB')
             AS VALUE
    FROM v$datafile
GROUP BY SUBSTR (name, 2, 1)
UNION ALL
SELECT 'Average Transfer Speed'
           AS metric,
       TO_CHAR (
              ROUND (
                    SUM (bytes / 1024 / 1024 / 1024)
                  / (  24
                     * (  SYSDATE
                        - TO_DATE ('2019/01/04 10:30', 'yy/mm/dd HH:MI'))),
                  0)
           || ' GB/hr')
           AS VALUE
  FROM v$datafile
 WHERE SUBSTR (name, 2, 1) = 'F'
UNION ALL
SELECT 'Estimated time remaining'
           AS metric,
       TO_CHAR (
              ROUND (
                    SUM (bytes / 1024 / 1024 / 1024)
                  / (  SUM (bytes / 1024 / 1024 / 1024)
                     / (  24
                        * (  SYSDATE
                           - TO_DATE ('2019/01/04 9:00', 'yy/mm/dd HH:MI')))),
                  2)
           || ' Days')
           AS VALUE
  FROM v$datafile
 WHERE SUBSTR (name, 2, 1) = 'D'NION ALL
SELECT 'Average Transfer Speed'
           AS metric,
       TO_CHAR (
              ROUND (
                    SUM (bytes / 1024 / 1024 / 1024)
                  / (  24
                     * (  SYSDATE
                        - TO_DATE ('2019/01/04 10:30', 'yy/mm/dd HH:MI'))),
                  0)
           || ' GB/hr')
           AS VALUE
  FROM v$datafile
 WHERE SUBSTR (name, 2, 1) = 'F'
UNION ALL
SELECT 'Estimated time remaining'
           AS metric,
       TO_CHAR (
                SUM (bytes / 1024 / 1024 / 1024)
              / (ROUND (
                       SUM (bytes / 1024 / 1024 / 1024)
                     / (  24
                        * (  SYSDATE
                           - TO_DATE ('2019/01/04 9:00', 'yy/mm/dd HH:MI'))),
                     0))
           || ' Days')
           AS VALUE
  FROM v$datafile
 WHERE SUBSTR (name, 2, 1) = 'D'
/

Then they can check on the stats for themselves

SYS oradb1> select * from storage_migration_status;

METRIC                   VALUE
------------------------ ----------------------------------------------
Spinning Media           6451 GB
Flash Media              128 GB
Average Transfer Speed   99 GB/hr
Estimated time remaining 2.8 Days

It’s not pretty but it gets the job done, leaving me with less contact with people, which is great 🙂

Windows 10 – When You Can’t Be Trusted

Unfortunately some things require windows. Old interfaces that ensure you are running IE6, old proprietary administration software, and other things I don’t use on a daily basis. Windows 10 is one of the most frustrating versions of windows I have ever run. It pretty much assumes you are too stupid to have any idea what you are doing. I just spend an hour fighting with it to let me download a file that builds boot disks. It decided it wasn’t safe. I don’t know what that means since I have been using this product for years! In fact Edge assumes you are so stupid, I can’t get on most websites because it doesn’t assume they are safe! It also doesn’t think it is safe for me to install chrome! How anyone could use the pile of crap for day to day work baffles me. I am glad my work laptop is still Windows 7, where I can get a virus if I am stupid, but doesn’t try to baby proof everything!

Rushing = Stupid Mistakes

I am currently being asked to refresh a database. The end users are pissed because they asked for the database to be refreshed at noon yesterday, this was the first they had asked for it, and they were doing “really important things” and by 3pm were pissed that I hadn’t responded to them, let alone completed the work. The thing was, I took a vacation. I took a whole two days off to spend some time with my girlfriend and wasn’t checking my email. Never mind that we have a whole team of DBAs that can refresh a database, they had emailed me. What they should have done was open a proper ticket, and someone would have helped out even though they hadn’t planned and were now in a panic.

So After escalating it to their boss because I wasn’t providing them with “good service”, who in turn talked to my boss, who informed him how they had dropped the ball, and I would deal with it today, that they were being completely unprofessional. (My boss is a good guy, and sticks up for me).

So here is the technical part of this post. Ever DBA has cloned a database, thousands and thousands of databases, but when you are rushing you can make stupid errors. I just fought this error for 20 minutes

Oracle instance shut down
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 06/08/2016 14:23:04
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
ORA-27191: sbtinfo2 returned error
Additional information: 3480

This should have been a no brainier, but in my rush I wasn’t fully paying attention to the error.

I was just issuing the command

duplicate target database to testdb from active database

My defaults for RMAN are using sbt ie tape, so for some reason the tape channels were being loaded. This causes an error. The solution is simple and should have been fairly immediate, just allocate disk channels.

run
{
ALLOCATE CHANNEL d1 TYPE DISK ;
ALLOCATE CHANNEL d2 TYPE DISK ;
ALLOCATE CHANNEL d3 TYPE DISK ;
ALLOCATE AUXILIARY CHANNEL a1 TYPE DISK ;
ALLOCATE AUXILIARY CHANNEL a2 TYPE DISK ;
ALLOCATE AUXILIARY CHANNEL a3 TYPE DISK ;
duplicate target database to testdb from active database;
}

Even simple things become very hard when you are rushing.