Versions Compared

    Key

    • This line was added.
    • This line was removed.
    • Formatting was changed.
    Comment: Published by Scroll Versions from this space and version 7.5-0

    The pfx_dw schema The dwh schema is a fairly textbook "textbook star schema", with things like jobs and worker usage in in fact tables, and the various things that you might use in a SQL SQL WHERE clause in  clause in dimension tables tables.

    The job

    ...

    _fact table, a typical "fact" table.

    For example, to get reports about various jobs over time, you'll be querying the pfx_dwthe dwh.job_fact table table:

    Code Block
    languagesql
    mysql> SHOW CREATE TABLE pfx_dwpfx=> \dS+ dwh.job_fact\G
    *************************** 1. row ***************************
    Table: job_fact
    Create Table: CREATE TABLE `job_fact` (
    `time_sk` int(11) NOT NULL,
    `job_id` int(11) DEFAULT NULL,
    `job_name` varchar(255) DEFAULT NULL,
    `job_priority` int(11) DEFAULT NULL,
    `job_timesubmit` int(11) DEFAULT NULL,
    `job_timestart` int(11) DEFAULT NULL,
    `job_timecomplete` int(11) DEFAULT NULL,
    `jobstatus_sk` int(11) DEFAULT NULL,
    `user_sk` smallint(5) unsigned DEFAULT NULL,
    `cluster_sk` smallint(5) unsigned DEFAULT NULL,
    `account_sk` smallint(5) unsigned DEFAULT NULL,
    `kind_sk` smallint(5) unsigned DEFAULT NULL,
    `prototype_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_show_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_shot_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_seq_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_client_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_dept_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_custom1_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_custom2_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_custom3_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_custom4_sk` smallint(5) unsigned DEFAULT NULL,
    `prod_custom5_sk` smallint(5) unsigned DEFAULT NULL,
    `cpu_count` int(10) unsigned NOT NULL,
    `cpu_seconds` int(10) unsigned NOT NULL,
    `work_count` int(10) unsigned NOT NULL,
    `worktime_min` int(10) unsigned NOT NULL,
    `worktime_max` int(10) unsigned NOT NULL,
    `worktime_avg` int(10) unsigned NOT NULL,
    `mem_max` bigint(20) unsigned DEFAULT NULL,
    `mem_avg` bigint(20) unsigned DEFAULT NULL,
    `mem_stddev` bigint(20) unsigned DEFAULT NULL,
    UNIQUE KEY `job_id` (`job_id`),
    KEY `time_sk` (`time_sk`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
    1 row in set (0.00 sec                                   Table "dwh.job_fact"
          Column      |           Type           | Modifiers | Storage  | Stats target | Description 
    ------------------+--------------------------+-----------+----------+--------------+-------------
     time_sk          | integer                  | not null  | plain    |              | 
     job_id           | integer                  |           | plain    |              | 
     job_pgrp         | integer                  |           | plain    |              | 
     job_name         | character varying(255)   |           | extended |              | 
     job_priority     | integer                  |           | plain    |              | 
     job_timesubmit   | timestamp with time zone |           | plain    |              | 
     job_timestart    | timestamp with time zone |           | plain    |              | 
     job_timecomplete | timestamp with time zone |           | plain    |              | 
     jobstatus_sk     | integer                  |           | plain    |              | 
     user_sk          | smallint                 |           | plain    |              | 
     cluster_sk       | smallint                 |           | plain    |              | 
     account_sk       | smallint                 |           | plain    |              | 
     kind_sk          | smallint                 |           | plain    |              | 
     prototype_sk     | smallint                 |           | plain    |              | 
     prod_show_sk     | smallint                 |           | plain    |              | 
     prod_shot_sk     | smallint                 |           | plain    |              | 
     prod_seq_sk      | smallint                 |           | plain    |              | 
     prod_client_sk   | smallint                 |           | plain    |              | 
     prod_dept_sk     | smallint                 |           | plain    |              | 
     prod_custom1_sk  | smallint                 |           | plain    |              | 
     prod_custom2_sk  | smallint                 |           | plain    |              | 
     prod_custom3_sk  | smallint                 |           | plain    |              | 
     prod_custom4_sk  | smallint                 |           | plain    |              | 
     prod_custom5_sk  | smallint                 |           | plain    |              | 
     cpu_count        | integer                  | not null  | plain    |              | 
     cpu_seconds      | integer                  | not null  | plain    |              | 
     work_count       | integer                  | not null  | plain    |              | 
     worktime_min     | integer                  | not null  | plain    |              | 
     worktime_max     | integer                  | not null  | plain    |              | 
     worktime_avg     | real                     | not null  | plain    |              | 
    Indexes:
        "job_id_unique" UNIQUE CONSTRAINT, btree (job_id)
        "idx_time" btree (time_sk)
    Note
    titlejob_time* are stored as timestamp types

    With Qube 7.x which uses PostgreSQL as the backbone, job_time* are stored as timestamp types, so, for example, you can fetch time data for a specific job as in:

     

     

    pfx=> SELECT job_id, job_timesubmit, job_timestart,job_timecomplete FROM dwh.job_fact WHERE job_id=57 ;
     job_id |     job_timesubmit     |     job_timestart      |    job_timecomplete    
    --------+------------------------+------------------------+------------------------
         57 | 2020-08-07 23:49:57-10 | 2020-08-07 23:49:57-10 | 2020-08-07 23:50:03-10
    (1 row)

     

    "*_sk" columns can be used to do INNER JOINs to a similarly named dimension table

    Any column that is named with an an _sk suffix is a Synthetic ynthetic Key that points to a corresponding dimension table, named with the part of the column before the the _sk; the dimension table will have a _dim suffix  suffix in the name. This way, it's easy to write the the JOIN's, the column name is a clue to the dimension table, which will have a column of the same name. Almost every dimension table will consist of a *_sk PRIMARY  PRIMARY KEY and a name column column.

     

    A typical dimension table, the "user_dim" table

    For example, the the user_sk column can be used to do a SQL SQL INNER JOIN to the the user_dim table table.

    Code Block
    languagesql
    mysql> SHOW CREATE TABLE pfx_dwpfx=> \dS+ dwh.user_dim\G
    *************************** 1. row ***************************
    Table: user_dim
    Create Table: CREATE TABLE `user_dim` (
    `user_sk` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(255) NOT NULL,
    PRIMARY KEY (`user_sk`),
    UNIQUE KEY `name` (`name`)
    ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)
     
    mysql> SELECT name FROM user_dim;
                                                          Table "dwh.user_dim"
     Column  |          Type          |                         Modifiers                          | Storage  | Stats target | Description 
    ---------+---------------+
    | name          |
    ---------+------------------------------------------------------------+----------+--------------+
    | Administrator-------------
     user_sk | |integer bar               | not null default nextval('user_dim_user_sk_seq'::regclass) | fooplain    |       | | foobar     | 
     name    | character varying(255) | fubarnot null        | | jburk         | | perforce      | | root                      | extended | shinya        |
    +     | 
    Indexes:
        "user_dim_pkey" PRIMARY KEY, btree (user_sk)
        "user_dim_name_key" UNIQUE CONSTRAINT, btree (name)
    
    pfx=> SELECT name FROM user_dim;
      name  
    ---------------+

     

    
     shinya
     root
     administrator
     jburk
     joe
     kmac
    (6 rows)
    
    

    Get a count of all jobs for a particular user:
    Code Block
    languagesql
    mysql>pfx=> SELECT 	COUNT(*) 
    FROM 
    	job_fact AS fact
    INNER JOIN 	user_dim AS user
    ON 	job_fact.user_sk = user_dim.user_sk WHERE 	user_dim.name = 'jburkshinya'
    ;
    +----------+
    | COUNT(*) |
    + count 
    ----------+
    | 5868  370
    (1 row)
    |
    +----------+
    
    

    The time dimension table

    The pfx_dwdwh.time_dim table  table is provided so that you don't have to perform date/time operations on every row in a fact table (since they can run into the 100's of millions of rows), instead you do a SQL INNER JOIN to  to it and use the values in the the time_dim table  table in your your WHERE clause. The The time_sk column in every fact table has an identical value in the the time_dim table which  table which has a single row with a primary key time_sk.  The  The time_sk value  value is actually the unix epoch time in seconds:

    Code Block
    languagesql
    mysql>pfx=> SELECT SELECT 	*  FROM FROM 	time_dim dim ORDER BY BY 	time_sk sk DESC LIMIT 1
    ;
    +------------+---------------------+------+------------+-----+------------+-------+------+ | time_sk    |       date_time           | hour | date   date    | dow | month_name | month | year |
    +------------+------------------------+------+------------+-----+------------+-------+------+
    | 13923396001607570700 | 20142020-0212-1309 17:0025:00-10 | 17  17 | 20142020-0212-1309 | 5  3 | FebruaryDecember   | 2   12  | 2014 |
    +------------+---------------------+------+------------+-----+------------+-------+------+
    1 row in set (0.00 sec)
    
    mysql> SELECT 
    	FROM_UNIXTIME2020
    (1 row)
    
    pfx=> SELECT to_timestamp(time_sk)
    	, date_time time FROM FROM 	time_dim dim ORDER BY BY 	time_sk sk DESC LIMIT 1
    ;
    +------------------------+---------------------+ | FROM_UNIXTIME(time_sk) |     to_timestamp      |       date_time        
      |
    +------------------------+---------------------+---
    | 20142020-0212-1309 17:0030:00-10    | 20142020-0212-1309 17:0030:00-10
    |
    +------------------------+---------------------+(1 row)
    
    

    The "job status" dimension table

    The pfx_dwdwh.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.

    Code Block
    languagesql
    mysql>pfx=> SELECT * FROM jobstatus_dim;
    +--------------+------------+-------------+----------------+-------------+ | jobstatus_sk | status_int | status_char | effective_date | expiry_date |
    +--------------+------------+-------------+----------------+-------------+
    | 1           1 | 16        16 | complete    | 1999-12-31     | 9999-12-31
     | | 2         2   | 32        32 | failed      | 1999-12-31     | 9999-12-31
     | | 3         3   | 48        48 | killed      | 1999-12-31     | 9999-12-31
     | | 4         4   | 272       261 | blockeddying       | 1999-12-31     | 9999-12-31
     | | 5         5   | 288       262 | waitingexiting     | 1999-12-31     | 9999-12-31
     | | 6         6   | 304       265 | suspendedregistering   | 1999-12-31     | 9999-12-31
     | | 7         7   | 320       272 | pendingblocked     | 1999-12-31     | 9999-12-31
     | | 8         8   | 323       288 | waiting     | 1999-12-31     | 9999-12-31
     | | 9         9   | 325       304 | badloginsuspended    | 1999-12-31     | 9999-12-31
     | | 10        10   | 336       320 | runningpending     | 1999-12-31     | 9999-12-31
               11 |        323 | 11waiting     | 1999-12-31     | 2619999-12-31
               12 | dying       325 | badlogin    | 1999-12-31     | 9999-12-31
     |
    +--------------+------------+-------------+----------------+-------------+

     

              13 |        336 | running     | 1999-12-31     | 9999-12-31
    (13 rows)
    
    
    Get a count of all jobs for a particular user for

    ...

    August,

    ...

    2020:
    Code Block
    languagesql
    mysql>pfx=> SELECT 	COUNT(*) 
    FROM 
    	job_fact AS fact
    INNER JOIN 	user_dim AS user
    ON 	job_fact.user_sk = user_dim.user_sk INNER JOIN 	time_dim AS time
    ON 	job_fact.time_sk=time_dim.time_sk WHERE 	user_dim.name = 'jburkshinya' AND 	time_dim.month = 18 AND 
    	time_dim.year = 2014
    2020;
    +----------+
    | COUNT(*) |
    + count 
    ----------+
    | 83   96
    (1 row)
    
    |
    +----------+
    Get a count of all jobs for each user for all of

    ...

    2020:
    Code Block
    languagesql
    mysql>pfx=> SELECT 	user_dim.name
    	, time_dim.month_name
    	, COUNT(*) as "job count" 
    FROM 
    	job_fact AS fact
    INNER JOIN 	user_dim AS user
    ON 	job_fact.user_sk = user_dim.user_sk INNER JOIN 	time_dim AS time
    ON 	job_fact.time_sk=time_dim.time_sk WHERE 	time_dim.year = 20132020 GROUP BY 	user_dim.name
    	, time_dim.month,time_dim.month_name ORDER BY 	user_dim.name
    	, time_dim.month
    ;
    +---------------+------------+-----------+ | name          | month_name | job count |
    +---------------+------------+-----------+
    | Administratorroot | January    | 4         |
    | bar           | July 		 | 1  	 	 | | foo       2
     root  | July 		 | 2August 	 	 | | foobar        | July 		 | 1 		 | | foobar 3
     root     | September  | 2 		 | | fubar   13
     root    | July		October | 1 		 | | jburk       8
     |root March		 | 1 		 | |November jburk         | May		 | 123 		 | | jburk  4
     shinya     | June		July | 24 		 | | jburk         | July		 | 220 		 | | jburk  9
     shinya     | August	 | 140 		 | | jburk         | September	 | 68 		 |
    | jburk         | October	 | 560 		 |
    | jburk         | November	 | 59 		 |
    | jburk         | December	 | 4467 	 |
    | perforce		| January	 | 128 		 |
    | perforce		| February	 | 4 		 |
    | root 			| January	 | 31
    		 |
    | root 			| February	 | 37 		 |
    | root 			| March		 | 7 		 |
    | root 			| April		 | 17 		 |
    | root 			| May		 | 7 		 |
    | root 			| June		 | 1 		 |
    | root 			| July		 | 11 		 |
    | root 			| September	 | 2 		 |
    | root 			| December	 | 2 		 | 
    | shinya 		| January	 | 23 		 |
    | shinya 		| February	 | 69 		 |
    | shinya 		| March		 | 7 		 |
    | shinya 		| April		 | 20 		 |
    | shinya 		| May		 | 2 		 |
    +---------------+------------+-----------+shinya | September  |       157
     shinya | October    |        50
     shinya | November   |        75
    (10 rows)
    
    
    Get a count of all jobs for each user for all of

    ...

    2020, broken down by month and the job's final status:
    Code Block
    languagesql
    mysql>pfx=> SELECT
    	  user_dim.name
    	  , time_dim.year
    	  , time_dim.month_name
    	  , statusjobstatus_dim.status_char
    	  , COUNT(*) as "job count"
    
    FROM
     	 job_fact
    AS fact
    INNER JOIN
    	  user_dim
    ASON
    user ON 	job_fact.user_sk=user_dim.user_sk
    INNER JOIN
    	  time_dim
    ASON
    time ON 	job_fact.time_sk=time_dim.time_sk
    INNER JOIN
    	  jobstatus_dim
    ASON
    status ON 
    	job_fact.jobstatus_sk=statusjobstatus_dim.jobstatus_sk
    WHERE
    	  time_dim.year = 20132020
    GROUP BY
    	  user_dim.name
    	,time_dim.year,time_dim.month_name,jobstatus_dim.status_char
      , time_dim.month
    	  , statusjobstatus_dim.status_int
    ORDER BY
    	  user_dim.name
    	  , time_dim.month
    	,status  ,jobstatus_dim.status_char
    ;
    +---------------+------+------------+-------------+-----------+ | name 			 | year | month_name | status_char | job count |
    +---------------+------+------------+-------------+-----------+
    | Administratorroot | 2013 | January2020 	| failed	July  | 4		  |  | barcomplete 			| 2013  | July 		| complete	  | 1		  | |2
    foo 			|root 2013  | July2020 		| complete	August  | 2		  | |complete foobar 		| 2013 | July 		| complete	  | 1 		  |3
    | foobarroot 		| 2013 | September2020 	| complete	September  | 2complete 		  | | fubar 		| 2013 | July 		| complete	 11
    | 1		root  | | jburk2020 		| 2013September | March 		| complete	failed  | 1		  | | jburk 		| 2013 | May 		| complete	  |2
    100		 root | | jburk 		| 20132020 | MayOctober 		| failed	  | 11		complete  |  | jburk 		| 2013 | May 		| killed	  |8
    12		 root |  | jburk2020 		| 2013November | June 		| complete	  | 4		  | | jburk 		| 2013 | June 		| failed	 4
    | 20		shinya  | |2020 jburk 		| 2013 | July 		| complete	  | 134		  | |complete jburk 		| 2013 | July 		| failed	  | 33		  | |6
    jburk 		shinya | 20132020 | July 		| killed	  | 53		  | |failed jburk 		| 2013 | August 	| complete	  | 67		  | | jburk 		|1
    2013 shinya | August2020 	| failed	July  | 25		  |  | jburkkilled 		| 2013 | August 	 | killed	  | 48		  | | jburk 		|2
    2013 shinya | September2020 	| complete	August  | 38		  | |complete jburk 		| 2013 | September 	| failed	  | 17  	25
     |shinya | jburk2020 		| 2013August | September 	  | killed	  | 13  	  | | jburk 		| 2013 | October 	| complete	 6
    | 450		shinya  | 2020 | jburkSeptember 		| 2013 | Octobercomplete 	| failed	  | 61		  | | jburk			| 2013 |139
    October 	shinya | killed	2020  | 49		September  | |failed jburk 		| 2013 | November 	| complete	  | 38		  | | jburk11
    		| 2013shinya | November2020 	| failed	September  | 12		killed  | | jburk 		 | 2013 | November 	| killed	  | 9		 7
    | |shinya jburk 		| 20132020 | DecemberOctober 	| complete	  | 3812	complete  | | jburk 		| 2013 | December 	| failed	  | 627		47
     |shinya | jburk2020 		| 2013October | December 	 | killed	failed  | 28		  | | perforce 		| 2013 | January 	| failed	  |2
    46		 shinya | 2020 | perforceOctober 		| 2013 | January 	| killed	  | 82		  | | perforce 		| 2013 | February 	| complete	  |1
    3		 shinya | 2020 | perforceNovember 		| 2013 | Februarycomplete 	| killed	  | 1		  | | root 			| 2013 64
     shinya | January2020 	| complete	November   | 24		failed  | | root 			 | 2013 | January 	| failed	  | 6		 8
    | shinya | root2020 			| 2013November | January 	| killed	  | 1		  | | root 			| 2013 | February 	| complete	  |3
    34 		  |
    | root 			| 2013 | February 	| killed	  | 3 		  |
    | root 			| 2013 | March 		| complete	  | 7		  |
    | root 			| 2013 | April 		| complete	  | 9		  |
    | root 			| 2013 | April 		| failed	  | 4		  |
    | root 			| 2013 | April 		| killed	  | 4		  |
    | root 			| 2013 | May 		| complete	  | 6		  |
    | root 			| 2013 | May 		| killed	  | 1		  |
    | root 			| 2013 | June 		| complete	  | 1		  |
    | root 			| 2013 | July 		| complete	  | 7		  |
    | root 			| 2013 | July 		| failed	  | 1		  |
    | root 			| 2013 | July 		| killed	  | 3		  |
    | root 			| 2013 | September 	| complete	  | 1		  |
    | root 			| 2013 | September 	| failed	  | 1		  |
    | root 			| 2013 | December 	| complete	  | 2		  |
    | shinya 		| 2013 | January 	| complete	  | 4		  |
    | shinya 		| 2013 | January 	| failed	  | 7 		  |
    | shinya 		| 2013 | January 	| killed	  | 12		  |
    | shinya 		| 2013 | February 	| complete	  | 60		  |
    | shinya 		| 2013 | February 	| failed	  | 2		  |
    | shinya 		| 2013 | February 	| killed	  | 7		  |
    | shinya 		| 2013 | March 		| complete	  | 2		  |
    | shinya 		| 2013 | March 		| failed	  | 2		  |
    | shinya 		| 2013 | March 		| killed	  | 3		  |
    | shinya 		| 2013 | April 		| complete	  | 16		  |
    | shinya 		| 2013 | April 		| killed	  | 4		  |
    | shinya 		| 2013 | May 		| killed	  | 2		  |
    +---------------+------+------------+-------------+-----------+

     

     
    (20 rows)
    
    
    Get the sum total of cpu_seconds used for each user for the last 7 days, broken down by user, date, and the job's final status:
    Code Block
    languagesql
    pfx=> SELECT
        user_dim.name
        , time_dim.date
        , jobstatus_dim.status_char
        , SUM(job_fact.cpu_seconds) as "cpu_time"
    FROM
        job_fact
    INNER JOIN
        user_dim
    ON
        job_fact.user_sk=user_dim.user_sk
    INNER JOIN
        time_dim
    ON
        job_fact.time_sk=time_dim.time_sk
    INNER JOIN
        jobstatus_dim
    ON
        job_fact.jobstatus_sk=jobstatus_dim.jobstatus_sk
    WHERE
        DATE_PART('day', CURRENT_DATE - time_dim.date_time) < 7
    GROUP BY
        user_dim.name,jobstatus_dim.status_char
        , time_dim.date
        , jobstatus_dim.status_int
    ORDER BY
        time_dim.date
        , cpu_time DESC
        , jobstatus_dim.status_char
    ;
      name  |    date    | status_char | cpu_time 
    --------+------------+-------------+----------
     shinya | 2020-11-13 | complete    |      144
     shinya | 2020-11-16 | complete    |       97
     shinya | 2020-11-17 | failed      |      906
     shinya | 2020-11-17 | killed      |      677
     shinya | 2020-11-17 | complete    |      102
     shinya | 2020-11-18 | failed      |    18695
     root   | 2020-11-18 | complete    |     1199
     shinya | 2020-11-18 | complete    |      760
     shinya | 2020-11-19 | failed      |     8022
     shinya | 2020-11-19 | complete    |      606
     root   | 2020-11-19 | complete    |       46
     shinya | 2020-11-19 | killed      |        0
    (12 rows)