First of all, I would like to give credits to original researchers who highlighted this issue into the public.
The original blog post can be accessed through this link
I used their concepts to write a tool which automates the entire process and finds out potential leaks.
The tool can be found here.
What is Travis?
Travis CI is a hosted continuous integration service used to build and test software projects hosted at GitHub.
According to Travis for open source projects, they can be used for free, but the entire Travis log remains public. This opens a door for malicious hackers to harvest sensitive API keys, passwords, etc. of the organization having public Travis logs.
Travis, in 2015, acknowledged that their API is being misused to find sensitive keys. They also started hiding potentially sensitive data in Travis logs by replacing them with the [secure]. But the question was, it enough?
Now Ed in his research already listed common keywords in Travis logs which could potentially leak some sensitive data. But while looking for those keywords, I realized most of them got replaced with [secure] by Travis. It seems like Travis hides sensitive data based on some whitelisted keywords.
So the plan was to look for keywords based on Ed's list and additionally use the concept of entropy to find possible API keys. This seemed to be the right solution because it was not easy to figure out more potential keywords.
Example GITHUB_TOKEN is perhaps blacklisted as can be seen above. What if the variable is TSD_GITHUB_TOKEN? Difficult to guess, but if we use the concept of entropy, we can find the possible leak.