The pfx_dw
schema The dwh
 schema is a fairly textbook textbook star schema, with things like jobs and worker usage in in fact tables, and the various things that you might use in a SQL SQL WHERE
clause in  clause in dimension tables tables.
The job_fact table, a typical "fact" table.
For example, to get reports about various jobs over time, you'll be querying the pfx_dwthe dwh.job_fact table table:
Code Block |
---|
|
mysql> SHOW CREATE TABLE pfx_dwpfx=> \dS+ dwh.job_fact\G
*************************** 1. row ***************************
Table: job_fact
Create Table: CREATE TABLE `job_fact` (
`time_sk` int(11) NOT NULL,
`job_id` int(11) DEFAULT NULL,
`job_name` varchar(255) DEFAULT NULL,
`job_priority` int(11) DEFAULT NULL,
`job_timesubmit` int(11) DEFAULT NULL,
`job_timestart` int(11) DEFAULT NULL,
`job_timecomplete` int(11) DEFAULT NULL,
`jobstatus_sk` int(11) DEFAULT NULL,
`user_sk` smallint(5) unsigned DEFAULT NULL,
`cluster_sk` smallint(5) unsigned DEFAULT NULL,
`account_sk` smallint(5) unsigned DEFAULT NULL,
`kind_sk` smallint(5) unsigned DEFAULT NULL,
`prototype_sk` smallint(5) unsigned DEFAULT NULL,
`prod_show_sk` smallint(5) unsigned DEFAULT NULL,
`prod_shot_sk` smallint(5) unsigned DEFAULT NULL,
`prod_seq_sk` smallint(5) unsigned DEFAULT NULL,
`prod_client_sk` smallint(5) unsigned DEFAULT NULL,
`prod_dept_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom1_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom2_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom3_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom4_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom5_sk` smallint(5) unsigned DEFAULT NULL,
`cpu_count` int(10) unsigned NOT NULL,
`cpu_seconds` int(10) unsigned NOT NULL,
`work_count` int(10) unsigned NOT NULL,
`worktime_min` int(10) unsigned NOT NULL,
`worktime_max` int(10) unsigned NOT NULL,
`worktime_avg` int(10) unsigned NOT NULL,
`mem_max` bigint(20) unsigned DEFAULT NULL,
`mem_avg` bigint(20) unsigned DEFAULT NULL,
`mem_stddev` bigint(20) unsigned DEFAULT NULL,
UNIQUE KEY `job_id` (`job_id`),
KEY `time_sk` (`time_sk`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec) |
...
title | The "job_time*" columns are stored in UNIX epoch time |
---|
Â
To convert them to human-readable format use the MySQLÂ FROM_UNIXTIME()
 function.
Â
mysql> SELECT job_id, job_timesubmit, FROM_UNIXTIME(job_timesubmit) FROM pfx_dw.job_fact WHERE job_id=
98269
;
+--------+----------------+-------------------------------+
| job_id | job_timesubmit | FROM_UNIXTIME(job_timesubmit) |
...
Table "dwh.job_fact"
Column | Type | Modifiers | Storage | Stats target | Description
------------------+--------------------------+-----------+----------+--------------+-------------
time_sk | integer | not null | plain | |
job_id | integer | | plain | |
job_pgrp | integer | | plain | |
job_name | character varying(255) | | extended | |
job_priority | integer | | plain | |
job_timesubmit | timestamp with time zone | | plain | |
job_timestart | timestamp with time zone | | plain | |
job_timecomplete | timestamp with time zone | | plain | |
jobstatus_sk | integer | | plain | |
user_sk | smallint | | plain | |
cluster_sk | smallint | | plain | |
account_sk | smallint | | plain | |
kind_sk | smallint | | plain | |
prototype_sk | smallint | | plain | |
prod_show_sk | smallint | | plain | |
prod_shot_sk | smallint | | plain | |
prod_seq_sk | smallint | | plain | |
prod_client_sk | smallint | | plain | |
prod_dept_sk | smallint | | plain | |
prod_custom1_sk | smallint | | plain | |
prod_custom2_sk | smallint | | plain | |
prod_custom3_sk | smallint | | plain | |
prod_custom4_sk | smallint | | plain | |
prod_custom5_sk | smallint | | plain | |
cpu_count | integer | not null | plain | |
cpu_seconds | integer | not null | plain | |
work_count | integer | not null | plain | |
worktime_min | integer | not null | plain | |
worktime_max | integer | not null | plain | |
worktime_avg | real | not null | plain | |
Indexes:
"job_id_unique" UNIQUE CONSTRAINT, btree (job_id)
"idx_time" btree (time_sk) |
Note |
---|
title | job_time* are stored as timestamp types |
---|
|
With Qube 7.x which uses PostgreSQL as the backbone, job_time* are stored as timestamp types, so, for example, you can fetch time data for a specific job as in:   pfx=> SELECT job_id, job_timesubmit, job_timestart,job_timecomplete FROM dwh.job_fact WHERE job_id=57 ;  job_id |   job_timesubmit   |   job_timestart   |  job_timecomplete   --------+---------------- +--------+----------------------- +|  98269  |     1414709214  | 2014 10 - 30  15 : 46 : 54            | ++-------------------------------+   57 | 2020-08-07 23:49:57-10 | 2020-08-07 23:49:57-10 | 2020-08-07 23:50:03-10 (1 row)
|
|
Â
"*_sk" columns can be used to do INNER JOINs to a similarly named dimension table
Any column that is named with an an _sk
 suffix is a a Synthetic ynthetic Key that points to a corresponding dimension table, named with the part of the column before the the _sk
; the dimension table will have a a _dim
suffix  suffix in the name. This way, it's easy to write the the JOIN
's, the column name is a clue to the dimension table, which will have a column of the same name. Almost every dimension table will consist of a a *_sk
PRIMARY Â PRIMARY KEY
 and a a name column column.
Â
A typical dimension table, the "user_dim" table
For example, the the user_sk column can be used to do a SQL SQL INNER JOIN
 to the the user_dim table table.
Code Block |
---|
|
mysql> SHOW CREATE TABLE pfx_dwpfx=> \dS+ dwh.user_dim\G
*************************** 1. row ***************************
Table: user_dim
Create Table: CREATE TABLE `user_dim` (
`user_sk` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`user_sk`),
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Â
mysql> SELECT name FROM user_dim;
Table "dwh.user_dim"
Column | Type | Modifiers | Storage | Stats target | Description
---------+---------------+
| name |
---------+------------------------------------------------------------+----------+--------------+
| Administrator-------------
user_sk | |integer bar | not null default nextval('user_dim_user_sk_seq'::regclass) | fooplain | | | foobar |
name | character varying(255) | fubarnot null | | jburk | | perforce | | root | extended | shinya |
+ |
Indexes:
"user_dim_pkey" PRIMARY KEY, btree (user_sk)
"user_dim_name_key" UNIQUE CONSTRAINT, btree (name)
pfx=> SELECT name FROM user_dim;
name
---------------+ |
Â
shinya
root
administrator
jburk
joe
kmac
(6 rows)
|
Get a count of all jobs for a particular user:
Code Block |
---|
|
mysql>pfx=> SELECT COUNT(*)
FROM
job_fact AS fact
INNER JOIN user_dim AS user
ON job_fact.user_sk = user_dim.user_sk WHERE user_dim.name = 'jburkshinya'
;
+----------+
| COUNT(*) |
+ count
----------+
| 5868 370
(1 row)
|
+----------+
|
The time dimension table
The pfx_dwdwh.time_dim table  table is provided so that you don't have to perform date/time operations on every row in a a fact table (since they can run into the 100's of millions of rows), instead you do a SQL INNER JOIN
to  to it and use the values in the the time_dim table  table in your your WHERE
 clause. The The time_sk column in every fact table has an identical value in the the time_dim
table which  table which has a single row with a primary key time_sk.  The  The time_sk
value  value is actually the unix epoch time in seconds:
Code Block |
---|
|
mysql>pfx=> SELECT SELECT * FROM FROM time_dim dim ORDER BY BY time_sk sk DESC LIMIT 1
;
+------------+---------------------+------+------------+-----+------------+-------+------+ | time_sk | date_time | hour | date date | dow | month_name | month | year |
+------------+------------------------+------+------------+-----+------------+-------+------+
| 13923396001607570700 | 20142020-0212-1309 17:0025:00-10 | 17 17 | 20142020-0212-1309 | 5 3 | FebruaryDecember | 2 12 | 2014 |
+------------+---------------------+------+------------+-----+------------+-------+------+
1 row in set (0.00 sec)
mysql> SELECTÂ
FROM_UNIXTIME2020
(1 row)
pfx=> SELECT to_timestamp(time_sk)
, date_time time FROM FROM time_dim dim ORDER BY BY time_sk sk DESC LIMIT 1
;
+------------------------+---------------------+ | FROM_UNIXTIME(time_sk) | to_timestamp | date_time
|
+------------------------+---------------------+---
| 20142020-0212-1309 17:0030:00-10 | 20142020-0212-1309 17:0030:00-10
|
+------------------------+---------------------+(1 row)
|
The "job status" dimension table
The pfx_dwdwh.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.
Code Block |
---|
|
mysql>pfx=> SELECT * FROM jobstatus_dim;
+--------------+------------+-------------+----------------+-------------+ | jobstatus_sk | status_int | status_char | effective_date | expiry_date |
+--------------+------------+-------------+----------------+-------------+
| 1 1 | 16 16 | complete | 1999-12-31 | 9999-12-31
| | 2 2 | 32 32 | failed | 1999-12-31 | 9999-12-31
| | 3 3 | 48 48 | killed | 1999-12-31 | 9999-12-31
| | 4 4 | 272 261 | blockeddying | 1999-12-31 | 9999-12-31
| | 5 5 | 288 262 | waitingexiting | 1999-12-31 | 9999-12-31
| | 6 6 | 304 265 | suspendedregistering | 1999-12-31 | 9999-12-31
| | 7 7 | 320 272 | pendingblocked | 1999-12-31 | 9999-12-31
| | 8 8 | 323 288 | waiting | 1999-12-31 | 9999-12-31
| | 9 9 | 325 304 | badloginsuspended | 1999-12-31 | 9999-12-31
| | 10 10 | 336 320 | runningpending | 1999-12-31 | 9999-12-31
11 | 323 | 11waiting | 1999-12-31 | 2619999-12-31
12 | dying 325 | badlogin | 1999-12-31 | 9999-12-31
|
+--------------+------------+-------------+----------------+-------------+ 13 | 336 | running | 1999-12-31 | 9999-12-31
(13 rows)
|
Get a count of all jobs for a particular user for
...
August,
...
2020:
Code Block |
---|
|
mysql>pfx=> SELECT COUNT(*)
FROM
job_fact AS fact
INNER JOIN user_dim AS user
ON job_fact.user_sk = user_dim.user_sk INNER JOIN time_dim AS time
ON job_fact.time_sk=time_dim.time_sk WHERE user_dim.name = 'jburkshinya' AND time_dim.month = 18 AND
time_dim.year = 2014
2020;
+----------+
| COUNT(*) |
+ count
----------+
| 83 96
(1 row)
|
+----------+ |
Get a count of all jobs for each user for all of
...
2020:
Code Block |
---|
|
mysql>pfx=> SELECT user_dim.name
, time_dim.month_name
, COUNT(*) as "job count"
FROM
job_fact AS fact
INNER JOIN user_dim AS user
ON job_fact.user_sk = user_dim.user_sk INNER JOIN time_dim AS time
ON job_fact.time_sk=time_dim.time_sk WHERE time_dim.year = 20132020 GROUP BY user_dim.name
, time_dim.month,time_dim.month_name ORDER BY user_dim.name
, time_dim.month
;
+---------------+------------+-----------+ | name | month_name | job count |
+---------------+------------+-----------+
| Administratorroot | January | 4 |
| bar | July | 1 | | foo 2
root | July | 2August | | foobar | July | 1 | | foobar 3
root | September | 2 | | fubar 13
root | July October | 1 | | jburk 8
|root March | 1 | |November jburk | May | 123 | | jburk 4
shinya | June July | 24 | | jburk | July | 220 | | jburk 9
shinya | August | 140 | | jburk | September | 68 |
| jburk | October | 560 |
| jburk | November | 59 |
| jburk | December | 4467 |
| perforce | January | 128 |
| perforce | February | 4 |
| root | January | 31
|
| root | February | 37 |
| root | March | 7 |
| root | April | 17 |
| root | May | 7 |
| root | June | 1 |
| root | July | 11 |
| root | September | 2 |
| root | December | 2 |
| shinya | January | 23 |
| shinya | February | 69 |
| shinya | March | 7 |
| shinya | April | 20 |
| shinya | May | 2 |
+---------------+------------+-----------+shinya | September | 157
shinya | October | 50
shinya | November | 75
(10 rows)
|
Get a count of all jobs for each user for all of
...
2020, broken down by month and the job's final status:
Code Block |
---|
|
mysql>pfx=> SELECT
user_dim.name
, time_dim.year
, time_dim.month_name
, statusjobstatus_dim.status_char
, COUNT(*) as "job count"
FROM
job_fact
AS fact
INNER JOIN
user_dim
ASON
user ON job_fact.user_sk=user_dim.user_sk
INNER JOIN
time_dim
ASON
time ON job_fact.time_sk=time_dim.time_sk
INNER JOIN
jobstatus_dim
ASON
status ON
job_fact.jobstatus_sk=statusjobstatus_dim.jobstatus_sk
WHERE
time_dim.year = 20132020
GROUP BY
user_dim.name
,time_dim.year,time_dim.month_name,jobstatus_dim.status_char
, time_dim.month
, statusjobstatus_dim.status_int
ORDER BY
user_dim.name
, time_dim.month
,status ,jobstatus_dim.status_char
;
+---------------+------+------------+-------------+-----------+ | name | year | month_name | status_char | job count |
+---------------+------+------------+-------------+-----------+
| Administratorroot | 2013 | January2020 | failed July | 4 | | bar | 2013 | July | complete | 1 | | foo | 2013 | July | complete | 2
|root | foobar | 20132020 | JulyAugust | complete | 1 | | foobar | 2013 | September | complete | 2 | | fubar | 2013 | July | complete 3
| 1 root | | jburk2020 | 2013September | March | complete | 1 | | jburk | 2013 | May | complete | 100 |
| jburk | 2013 | May | failed | 11
|root | jburk | 20132020 | MaySeptember | killed | 12 |
| jburk | 2013 | June | complete | 4 |
| jburk | 2013 | June | failed | 20 | | jburk | 2013 | July | complete |2
134 root | | jburk | 20132020 | JulyOctober | failed | 33 |
| jburk | 2013 | July | killed | 53 |
| jburk | 2013 | August | complete | 67 | | jburk | 2013 | August | failed 8
| 25 root | | jburk2020 | 2013November | August | killed | 48 |
| jburk | 2013 | September | complete | 38 | | jburk | 2013 | September | failed 4
| 17shinya | |2020 jburk | 2013July | September | killed | 13 | | jburk | 2013 | October | complete | 450 | | jburk | 2013 | October | failed 6
| 61 shinya | |2020 jburk | 2013July | October | killed | 49 | | jburk | 2013 | November | complete | 38 |
| jburk | 2013 | November | failed | 12 | | jburk | 2013 | November | killed |1
9 shinya | |2020 jburk | 2013July | December | complete | 3812 | | jburk | 2013 | December | failed | 627 |
| jburk | 2013 | December | killed | 28 | | perforce | 2013 | January | failed |2
46 shinya | |2020 perforce | 2013August | January | killed | 82 |
| perforce | 2013 | February | complete | 3 | | perforce | 2013 | February | killed 25
|shinya 1 | |2020 root | 2013August | January | complete | 24 |
| root | 2013 | January | failed | 6 |
| root | 2013 | January | killed | 1 | | root | 2013 | February | complete |6
34 shinya | |2020 root | 2013September | February | killed | 3 |
| root | 2013 | March | complete | 7 | | root | 2013 | April |139
complete shinya | 9 2020 | |September root | 2013 | April | failed | 4 | | root | 2013 | April | killed 11
| 4 shinya | |2020 root | 2013September | May | complete | 6 |
| root | 2013 | May | killed | 1 | | root | 2013 | June | complete | 1 |
| root | 2013 | July | complete | 7
|shinya | root2020 | 2013October | July | failed | 1 complete | | root | 2013 | July | killed | 3 47
|shinya | root2020 | 2013October | September | complete | 1 |
| root | 2013 | September | failed | 1 | | root | 2013 | December | complete | 2
|
| shinya | 20132020 | JanuaryOctober | complete | 4 killed | | shinya | 2013 | January | failed | 7 |1
| shinya | 20132020 | JanuaryNovember | killed | 12 |
| shinya | 2013 | February | complete | 60 | | shinya | 2013 | February | failed 64
| 2 |
| shinya | 20132020 | FebruaryNovember | killed | 7 |
| shinya | 2013 | March | complete | 2 |
| shinya | 2013 | March | failed | 2 | | shinya | 2013 | March | killed |8
3 |
| shinya | 20132020 | AprilNovember | complete | 16 |
| shinya | 2013 | April | killed | 4 | | shinya | 2013 | May | killed | 2 |
+---------------+------+------------+-------------+-----------+3
(20 rows)
|
Get the sum total of cpu_seconds used for each user for the last 7 days, broken down by user, date, and the job's final status:
Code Block |
---|
|
pfx=> SELECT
user_dim.name
, time_dim.date
, statusjobstatus_dim.status_char
, SUM(job_fact.cpu_seconds) as "cpu_time"
FROM
job_fact
AS fact
INNER JOIN
user_dim
AS user
ON
job_fact.user_sk=user_dim.user_sk
INNER JOIN
time_dim
AS time
ON
job_fact.time_sk=time_dim.time_sk
INNER JOIN
jobstatus_dim
AS status
ON
job_fact.jobstatus_sk=statusjobstatus_dim.jobstatus_sk
WHERE
DATEDIFF(CURDATE(), timeDATE_PART('day', CURRENT_DATE - time_dim.date_time) < 7
GROUP BY
user_dim.name,jobstatus_dim.status_char
, time_dim.date
, statusjobstatus_dim.status_int
ORDER BY
time_dim.date
, cpu_time DESC
, statusjobstatus_dim.status_char
;
+--------+------------+-------------+----------+
| name | date date | status_char | cpu_time |
+--------+------------+-------------+----------+
<< snipped >>
| jburk | 2014-07-14shinya | 2020-11-13 | complete | 351036 | | jburk144
shinya | 20142020-0711-1416 | killedcomplete | | 60029 |97
| jburkshinya | 20142020-0711-1417 | failed | 139906
| |shinya coxj | 20142020-0711-1417 | killed | 677
98 |shinya | garza | 2014-07-142020-11-17 | killedcomplete | 102
0shinya | | jburk | 2014-07-152020-11-18 | completefailed | 2891018695
| |root fubar | 20142020-0711-1518 | complete | 18610 |1199
| foobarshinya | 20142020-0711-1518 | complete | 18561 | |760
jburk shinya | 20142020-0711-1519 | killedfailed | 16967 |8022
| jburkshinya | 20142020-0711-1519 | failedcomplete | 606
27 |root | jburk | 20142020-0711-1619 | complete | 46797 | | jburk46
shinya | 20142020-0711-1619 | killed | 17136 | | jburk 0
| 2014-07-16 | failed | 2 |
<< snipped >>
+--------+------------+-------------+----------+(12 rows)
|