Prepared byIncredible
AI & Systems Discovery · Reporting & Monitoring

Know the storm before you walk in

Every morning, Red Stag's reporting server fires around 26 reports off the back of overnight SQL jobs - and right now someone checks, by hand, that they all ran. This is what we heard across three scoping calls, what the first week of live test data already shows, and where we would take it. A working document, ranked together - nothing locked.

Did the reports send Did the jobs run One morning all-clear

Three checks, one morning answer

Rodney framed the problem as three layers that have to hold for a report to be trustworthy. Today each is checked manually, after the fact. The goal is a single proactive status - "all groovy" or "look at this" - on Trevor's phone before he opens the door. We would build from the bottom of the stack outwards, starting where the data is already flowing.

Did the integration jobs run?

Overnight SQL scrapes machine data into the warehouse

Each night, integration jobs pull shift data off the SCADA / IFIX machines into the data warehouse. If one locks up, the day's reporting is built on incomplete data.

Foundation

Did the reports send?

SSRS generates and emails them via the on-site SMTP relay

SSRS 2018 builds and schedules the reports, then emails them to managers and the DBA admin mailbox. If the SMTP relay is blacklisted or a subscription fails, people quietly never get their report.

Enabler

Was the data complete?

An end-of-shift has to be recorded for a report to be whole

Occasionally a mill machine misses its end-of-shift mark, so a report looks fine on the surface but is half-populated. The morning all-clear is the outcome that ties all three together.

Data quality

We deliver by pain and impact, not strictly top-to-bottom - the quickest, highest-value win (the status email and the deliverability fix) comes first, then the deeper integration and end-of-shift checks. As of the 17 June call, the next horizon is in view too: scoped or sandboxed access to the data so we can build a proper interface over it, not just read the emails.

Priorities, in order

Ranked by impact against effort, not by the layer diagram. Matt framed it as an elevator - get to the first floor (a useful status email) fast, then climb as more data access opens up. Tap any card to expand. Nothing locked - we rank these together.

1

The morning all-clear

One status email before Trevor walks in
Highest painQuick first floor

The pain

  • Trevor spends ~30 minutes every morning checking by hand whether all the reports came out and the SQL jobs ran.
  • Even after checking, a manager often emails hours later: "I didn't get my report."
  • When Trevor is away, Rodney inherits the entire manual routine.

What we'd build

  • A single daily status email - "all groovy" or "look at these" - waiting before the shift starts.
  • Driven by the mailhook (report receipts) plus Trevor's SSRS agent-log query: subscriptions sent, agent finished on time.
  • No on-prem access needed for this first floor - it runs off emailed signals.
"Basically what we're looking for is, before Trevor even walks in through the door, he knows the status... he can just be reactive and get straight onto it."Rodney Mills, on the goal
2

Fix report deliverability (SPF)

Stop the SMTP relay being blacklisted
Real bugQuick to fix

The pain

  • The data warehouse / Timbersmart reports relay through ITCO's on-site SMTP server, which has previously failed and been blacklisted.
  • Every one of this week's test reports fails an SPF check - they only land because DKIM passes. That is a fragile position.

What we'd build

  • Add the relay's sending IP (180.235.104.212) to the redstagtimber.co.nz SPF record - a small DNS change.
  • Longer term, consider routing report mail through an authenticated send so deliverability stops depending on DKIM alone.
  • Raised on the 17 June call - Matt to take it to ITCO. It affects every report off that mail server, so worth doing before more reports route through the mailhook.
"It'll probably be flagged at some point... we can have a talk to ITCO about that - that'll happen to all these other ones as well, coming from that mail server."Matt & Mitch, 17 June, on the SPF finding
3

The logic tree: due vs not-due

Tell a broken report from one that legitimately didn't run
Medium painMedium

The pain

  • The EWP report only runs when wood is filleted through the kilns - so when that process is not done, the report legitimately never generates. A manager chased it as "missing" after a month.
  • It does not show as failed on the day-summary; Rodney had to run a query against the database to see it had not run.
  • Separately, four SQL agents were stacked into one sequential job - the first step ran so long it missed the 3am window, silently skipping the next three steps (including shift patterns). Rodney has since split them.

What we'd build

  • A documented map of which reports are conditional and on what trigger (the EWP / kiln case is the first), so the monitor can say "not due" instead of crying wolf.
  • Surface the database failure flags Trevor can already query, on a schedule, into the mailhook as a daily error feed.
  • Feeds both the status email and any later AI - it needs to know what to expect.
"That one we talked about - the EWP report - it only generates if they do one of those filleting processes through the kilns... there would have been a flag inside the database, I just didn't think of checking it for that long."Rodney Mills & Matt, 17 June
4

End-of-shift gap detection

Catch the half-populated reports before managers do
High painNeeds deeper access

The pain

  • If a busy SCADA box misses recording its end-of-shift, the night-shift data never rolls over - the report runs but is half empty.
  • It is the hardest class of fault: everything looks fine on the surface.

What we'd build

  • Detect the missing end-of-shift signal and flag it on the morning status, with the reason.
  • Later, support the remediation Rodney described - scrape the value off the machine and re-run.
"Everything was fine on the surface, but when you dig deeper... 97% of the time everything runs smoothly, but occasionally a job will lock up."Trevor Pratt & Rodney Mills
+

On the radar: local-AI Q&A

Ask the mill data questions - hosted on-prem, nothing to the cloud
Proof of concept

The idea

  • Let staff ask plain-English questions of the data (e.g. "how many hi-vis jackets did we buy last month?") via an MCP server over SQL views.
  • Stores / Workmate data is cloud-comfortable; Timbersmart is financial and must stay on-prem.

Where it sits

  • A separate workstream - Rodney is already testing locally with Ollama and Qwen.
  • Incredible can bring local-model experience and help present a board-ready proof of concept.
"You could sell it by saying - well yeah, it's AI, but it's hosted locally. Nothing's going into the cloud for anyone to scrape."Rodney Mills, on getting it past the board

Pain against ease of build

Top-right first: high pain, quick to build. The morning all-clear and the SPF fix are the early wins we can ship from data we already have. End-of-shift detection is high-value but waits on deeper access; the local-AI proof of concept is a separate, longer play.

Pain / impact →
1Morning all-clear
2SPF fix
3Logic tree
4End-of-shift
+Local AI
Ease of build →
The morning all-clear Deliverability fix Data-quality checks Future / PoC

The team, and how the morning runs

Today the whole morning routine runs through one person. The monitor changes the shape of that - it becomes the first stop, fed by the SQL logs and report receipts, and Trevor becomes the fixer rather than the checker. Toggle to see the shift.

Checks / waits on the morning status Feeds the monitor its signal

From end-of-shift to the managers' inboxes - and where the monitor plugs in

01

Shift runs

Mill / SCADA

IFIX machines record PLC data - throughput, moisture, grade - per area of the mill.

02

End of shift

SCADA box

The machine closes off the shift and rolls over. The occasional miss point.

03

Integration jobs

Data warehouse

Overnight jobs scrape each machine into the warehouse tables.

04

Reports generate

SSRS 2018

Scheduled reports build off the integrated data, early morning.

05

SMTP send

Relay + mailhook

Reports email to managers and the DBA mailbox - CC'd to our mailhook. Where blacklisting bites.

06

Managers' inboxes

Quality & site managers

The quality team open the day with their reports already waiting.

07

Morning all-clear

The monitor

One status email: everything ran, or here is exactly what to look at.

The people

Trevor Pratt

Database administrator, Red Stag

Owns the morning check today - 30-odd reports and the SQL jobs, by hand. The monitor's first job is to lift that routine off him so he is the one who fixes, not the one who hunts.

Rodney Mills

IT & systems, Red Stag

Our main contact and knowledge-holder - sets up the test report, knows the systems and the triggers, and is running the local-AI experiments.

Brenda Fort

Production (ENT) server, Red Stag

Looks after the production software server. Relevant because it also runs report services - a second place report status lives.

Site & quality managers

Report consumers

Open each shift with their daily reports - downtime, moisture, throughput. They are who feels it when a report is late or wrong.

Paul, Tim & the board

Sign-off & sponsorship

Paul approves data leaving site; Tim is the manager above that. At board level, Marty Verry is the push to "move with the times" - useful air cover for the AI work.

Incredible

Matt & Mitch - build; Aiyana - delivery

Build the monitor and the proof of concept; Aiyana runs scheduling and capacity.

The systems, and how they fit

Everything is on-prem and Red Stag intends to keep it there. The plan does not move mill data to the cloud - it works from signals coming out: report emails, agent logs, exception flags.

The integration prize: a read-only signal out

The whole approach turns on getting a signal out without the data leaving site - report receipts to a mailhook (already proven), SSRS agent logs by email, and later exception flags. Enough to be proactive, nothing sensitive in motion.

Core system · On-prem

Timbersmart (SQL Server)

Tracks every packet of wood through the mill - the financial heart. Dollars, salaries, the sensitive one.

Stays on-prem; never to the cloud
Data & reporting · On-prem

Data warehouse / reporting server

Holds the machine integrations (kilns, sonic testers, planers, graders) and runs Workmate plus SSRS. Rodney owns this one.

Source of the integration-job status
Production · On-prem

ENT production server

The production software server, looked after by Brenda Fort - but it also runs report services, so Rodney still watches it for flags.

Second source of report status
Reporting engine

SSRS 2018 (Native)

Builds, schedules and emails the reports to managers and DBA admin. Version and mode confirmed by Rodney.

Agent logs tell us what sent
Mail · Watch point

On-site SMTP relay (ITCO)

A simple SMTP server relays the reports out. The component that has been blacklisted before.

Fix SPF; consider authenticated send
Capture · Our tool

Glide mailhook

Every report is CC'd to a Glide mail address, giving us a timestamped record of what was delivered.

The delivery oracle - live now
Edge · On-prem

SCADA / IFIX Historian

Windows 11 machines across the mill recording PLC tags into a local database; the source of end-of-shift.

Where end-of-shift gaps originate
Cloud · Context

Microsoft 365 / SharePoint

Email and documents already live in the cloud - useful context when weighing what is genuinely "on-prem only".

Already off-site, by the way
AI · PoC

Local model (Ollama + Qwen)

Rodney's on-prem experiment for plain-English Q&A over mill data, so nothing sensitive goes to a hosted model.

Workstream 2 - prove it, then scale
Access · Proposed

Scoped sandbox database

Rather than open the whole database, Red Stag could stand up a sandbox with just the tables or views we need - or anonymised sample data - so sign-off is easy and nothing sensitive leaves.

The likely path to a real interface

Week one of live data

The daily test report Rodney set up is already flowing into our mailhook. Eight days in, it is telling us two useful things - the job is rock-steady, and the deliverability problem is real and measurable.

8 / 8
Days captured, no misses (10-17 Jun)
06:15
Send time each morning, within seconds
100%
Fail SPF - the blacklisting risk
DKIM ✓
Why they still land - for now

The root cause, found in the headers

Every report leaves RSTSQL01, relays through RSTRDSM2, and goes out via an IP (180.235.104.212) that is not in the redstagtimber.co.nz SPF record - so SPF fails. They survive only because DKIM passes (so DMARC passes). That is almost certainly the historic "blacklisted" behaviour. Adding the relay IP to SPF is a small DNS change that removes the risk at the root.

Received (mailhook)SubjectFromSPFDKIM / DMARC
10 Jun, 16:33SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
11 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
12 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
13 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
14 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
15 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
16 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass
17 Jun, 11:15SMTP Test Reportreports@redstagtimber.co.nzFailpass / pass

The consistency is the point: a job this regular means "nothing arrived by 06:45" is a reliable alarm. The mailhook loop - capture, timestamp, verify - is proven. From 17 June it stops being just the test report: Rodney is forwarding the real integration-status, day-summary and error reports into the same hook, so the morning all-clear gets its actual content.

Open questions

A handful of answers sharpen the build. We work through these together - several are sign-off rather than technical.

Getting the data (sign-off)

?
Direct (read-only) or sandboxed access to the on-prem SQL server - is it possible, and at what scope?Raised 17 Jun. It can be scoped to a single table or view, or a sandbox DB - which makes sign-off far easier.
?
Sign-off sits with Paul (and likely Tim) - but he's tied up with end-of-financial-year budgets to 30 June. Who can we reach in the meantime?So the data-access conversation does not stall for two weeks.
?
NDA: Mitch will send Incredible's standard NDA plus a short data-handling note to bolster the contract clause - does Red Stag also have one to put on file?Reinforces the confidentiality already in the contract and clears the path to deeper access.

The morning all-clear

?
What is the "all-clear by" time? Reports fire at 06:15 - is nothing-by-06:45 a fair alarm?Sets the threshold for late vs missing.
?
Who should receive the status email - Trevor only, Rodney too, and who is the backup when Trevor is away?The "when he's on holiday" case is the one that hurts most.
?
Expected report count - around 26 subscribed - is being nailed down by a daily "all active subscriptions" report (disabled ones excluded) to the mailhook.In motion from 17 Jun: gives us the authoritative expected-vs-received baseline.

Data quality & logic

?
Beyond EWP / kiln filleting, which other reports are conditional, and on what trigger?EWP is the first documented case - we need the rest so "not due" never reads as "broken".
?
The failure flags Trevor can query - can those run on a schedule into the mailhook as a daily error feed?Confirmed 17 Jun that the flags exist in the database; this turns them into a proactive signal.

Local-AI proof of concept

?
Start the PoC on the cloud-comfortable stores / Workmate data before touching Timbersmart?Fastest path to something demonstrable without the sensitivity debate.
?
What is the hardware-budget appetite for on-prem hosting if the board says go?Local models need real hardware - worth sizing early.
?
Who needs convincing, and can Paul get us in front of Tim and the board?Marty Verry is keen - a live demo could carry the room.

Working assumptions

These shape the estimates. The first is the big lever - everything scales with how much signal we can get off-site.

The key unknown: how much signal can leave site

We cannot reach the SQL servers directly today - they are on-prem. The whole approach assumes we can get a signal out: report receipts to a mailhook (proven), SSRS agent logs by email, and exception flags. The 17 June call opened a stronger option - a scoped or sandboxed DB connection (one view, or anonymised sample data) - which would let us build a real interface. If that never opens, the emailed signals alone are still enough for the morning all-clear.

SSRS is 2018, Native mode

Confirmed by Rodney across the calls - shapes how we read subscriptions and agent logs.

Confirmed

Reports can be CC'd to our mailhook

Live and working - eight days of clean captures prove the delivery-confirmation loop.

Confirmed

Status & error reports are non-sensitive and forwardable

Rodney confirmed 17 Jun he can forward the integration-status, day-summary, end-of-shift and DBA-admin reports - they show whether things ran, not commercial data.

Confirmed

A useful status email needs no deeper access

The morning all-clear can be built from the mailhook plus Trevor's agent-log query alone.

High confidence

Forwarding report contents off-site needs sign-off

Logs feel fine; report contents are company data. Paul, possibly Tim, to confirm.

To confirm

End-of-shift misses leave a detectable trace

Gap detection assumes there is a log or missing row we can reach without live SQL access.

To confirm

SPF fails on report mail - raised with ITCO

Found in this week's data and flagged on 17 Jun; Matt to action with ITCO via a DNS change to the SPF record.

In progress

Local-AI at Timbersmart scale is viable on-prem

Proven in concept by Rodney with Ollama / Qwen; production scale and hardware need verifying.

Must verify

Where this goes next - agreed 17 June

  • Rodney: forward the integration-status, day-summary, end-of-shift and DBA-admin reports to the mailhook so the daily error and status feed can start.
  • Incredible (Matt): add those reports to the mailhook; build a daily "all active subscriptions" report (disabled excluded) for the expected-vs-received baseline; chase scoped SQL access with Paul; raise the SPF fix with ITCO.
  • Incredible (Mitch): draft the NDA plus a short data-handling note, and start processing the error feed to spot what recurs day to day.
  • Together: a sandbox or anonymised dataset so we can prototype a proper interface over the data without anything sensitive leaving site.
  • Next call: Thursday 8:00am - Mitch to present progress (Matt away).
  • A working document - we keep ranking it together as the picture sharpens.