CPU Temperature Monitoring using Google Docs
I have a Gateway laptop that I bought (refurbished) about five years ago. It’s running Linux Mint, and I use it for light web browsing and playing music on the sound system. It’s not great - the build quality was pretty poor to begin with, and it’s definitely showing its age - but it works, mostly.
This summer, however, it began to turn itself off. Hard shutdown, no warning, just instantly dead. The most common cause of this is overheating - the system shutting itself off to prevent damage. But the laptop wasn’t that hot to the touch. It was running its fans on high speed most of the time, but I’d felt laptops get a lot hotter before. The problem got worse, though. Eventually it was so bad that it would shut off within minutes of booting, even after hours of sitting idle.
I started to think it might be the thermal paste - the goop that transfers heat from the CPU and GPU to the heat sink, and from there out of the case. What if it had dried out? I could replace it, but I didn’t relish the prospect of cracking the case on this thing for no reason (luckily, it had become useless anyway, so I could risk accidentally destroying it).
I wanted to be sure that overheating was the cause of the shutdowns. I needed to log the temperature readings from the motherboard sensors, and I wanted to send them to a healthy system so I could see what was happening. I decided to see if I could send them to a Google Docs spreadsheet - I could collect the data there, and also analyze it or export it as necessary.
First of all, I installed the
sensors package. Running this gave me output like:
acpitz-virtual-0 Adapter: Virtual device temp1: +94.0°C (crit = +100.0°C) temp2: +87.0°C (crit = +100.0°C) k10temp-pci-00c3 Adapter: PCI adapter temp1: +94.1°C (high = +95.0°C)
Okay, so I have three sensors in this machine, and they were running pretty freaking hot. Not good. Now I needed to parse that output for just the current values, removing all of the other stuff. After some command-line fiddling, I settled on:
sensors | grep ^temp | tr -d "°C+" | tr -s " " | cut -d" " -f2 | paste -sd " "
which resulted in one line with all three values on it:
94.0 87.0 94.1.
In Google Docs, I created a spreadsheet with the columns I wanted - a timestamp and three sensor fields. To collect the data, I linked a survery form to the spreadsheet - I’d be able to write a script to submit the sensor values using a simple HTTP POST. The source code for the form contained the field names I’d need to send.
My final script would post the sensor values every 30 seconds. It looked like this:
(The form is no longer accepting responses.) I set it up as a cron job to run on system startup:
With the spreadsheet happily accepting my sensor values, I rebooted the machine a few times, letting it turn itself off each time and cool down for a while. While I waited, I put together a chart view of the sensor data. The result was this:
In the end, I didn’t need to do much analysis. It was clear that in each case, the system temperature climbed until sensors #1 and #3 were near 100°, and then the system shut down, registering no data points until it started up again at a much lower temperature.
So heat was definitely the culprit. I ordered a tube of thermal paste.
I opened up the laptop, carefully collecting all the tiny screws (a strongish magnet in a stainless steel bowl works great!). The heat sink assembly was mounted near the very bottom of the case. I unscrewed and removed the heat sink, exposing the thermal paste on the CPU and GPU. It was fairly dry and crumbly. I scraped it off, removing the last remnants with isopropyl alcohol, and applied the new paste. With that done, I reassembled the laptop and (to put it mildly) was quite relieved when it booted up normally.
Here are the post-op temperature readings:
Pretty good. It still runs a bit hotter than I’d like, particularly under load, but it hasn’t turned itself off once since it got the new thermal paste. All in all, a very satisfying result.
Monotask was an attention management application, a joint startup effort between Charlie Park and me. We worked on it from 2011 to 2013.
It consisted of a Rails application that would allow users to specify a weekly schedule of sites to block, to help our users stay focused on their work. There was also a client application, written in C++/Qt, that would synchronize with the web application and use the schedule data to control an HTTP proxy server (a custom Apache build). The proxy server would disallow access to the desired sites using rewrite rules.
The client software was my primary focus; it went through several iterations. We were initially targeting Windows and Mac, and the first version was a C#/.NET/Mono application. At the time there was poor support for certain OS X features, so I moved to a C++/Qt client that worked quite well. Parts of the application needed OS-specific C++ and C code, and the platform was flexible enough to allow this.
The biggest pain point was reliably controlling the Apache instance; it would sometimes not come back up after hibernation, for example, and was difficult to restart on Windows. At the time that we closed the project down, I was investigating various ways to integrate the proxy more fully into the control application; I was looking at a Node.js solution for this, and also experimenting with my own C++ HTTP forward proxy.
We decided to shut the project down because we weren't able to meet our subscriber goals, but we were both very satisfied with the application we'd built.
CATS (Certification and Accreditation Tracking System) is a web application for people who need to get various certifications through DEQ. This includes erosion & sediment control, stormwater management, and responsible land disturber (RLD) certificates. The app allows users to pay for and take online courses, track recertification hours, and print their certificates.
I am the architect and lead developer on the CATS project. It consists of an ASP.NET MVC 5 web application backed by SQL Server, plus some additional supporting utilities.
This project is currently in development but should be feature complete by the end of the year. The most interesting aspect of the project has been the various other systems it needs to integrate with. This includes:
- Legacy data: Many of these certifications had been administered by another state agency (DCR) for several years, and so we have developed a process to load the legacy data into our production systems. This also includes some cleanup tasks, such as deduping, string normalization, etc.
- Pearson VUE: Some certification tests are administered at Pearson testing centers, and they make the test data available to us on an SFTP server. I wrote an ETL application that downloads the previous day's test data, parses and verifies it, and loads it into the CATS production database.
- Articulate/SCORM: For the RLD certification, users will pay for the certification and then take an online course and quiz. The course is developed by our training staff using Articulate, and is SCORM-compatible. Rather than standing up a full LMS, I wrote a basic implementation of the SCORM API, enough to launch the course and retrieve the results.
- Elavon: To accept payments, we need to integrate with the state-approved payment gateway vendor. We are using Elavon's hosted payment form to accept credit card info so it does not pass through our systems. I was able to build on some work done on another DEQ project that was set up to accept payments and process them in daily batches. For CATS, we will handle the payments in real time.
CATS is also the second DEQ project to use a new security framework that provides an integrated login and permission system for both internal and external users. Members of the public will not need to create separate accounts for separate applications as we continue to bring more publicly accessible apps online. Internal users can use their Active Directory credentials to access any of the apps. I helped to design the security framework and have continued to update it.