<>

Breck's Blog - Data Posts

The Galton Board

September 7, 2024

Mark Hebner and his team have refined an 1889 invention from Francis Galton and made a version you can hold in your hand to conduct real world probability experiments 10,000 times faster than flipping a coin.

Continue reading...

"I'll give you this library," the billionaire said, sweeping his arms up toward the majestic ceiling.

"Or...you can have this scroll," he said, pointing down at a stick of paper on a table, tied with a red ribbon.

Continue reading...

May 26, 2024 β€” You once could buy transistors, capacitors, and other components at your local neighborhood store. The decline in US computer and electronics manufacturing correlates with the decline in RadioShacks. To catch up to other nations, maybe it is time for a next-gen RadioShack.

Continue reading...

Analyzing the version numbers of 621 programming languages

Interactive Version

May 25, 2024 β€” I just pushed version 93.0.0 of my language Scroll. Version 93!

Continue reading...

HTML | TXT | PDF

by Breck Yunits

May 21, 2024

All tabular knowledge can be stored in a single long plain text file.

The only syntax characters needed are spaces and newlines.

This has many advantages over existing binary storage formats.

Using the method below, a very long scroll could be made containing all tabular scientific knowledge in a computable form.

Continue reading...

May 11, 2024 β€” That charts work at all is amazing.

Forty years.

One-billion heart beats.

Four-quadrillion cells.

Eight-hundred-eighty-octillion ATP molecules.

Compressed to two marks on a surface.

Continue reading...

by Breck Yunits

Newton, Darwin, and a modern Scientist go to heaven.

Continue reading...

Bad models of the world can be dangerous.

We stood at the edge of the lake.

Everyone was in a wetsuit.

Except for me.

Wetsuits: hundreds of people.

Boardshorts: one person.

Continue reading...

Datasets are automated tests for world models

by Breck Yunits

April 23, 2024 β€” I wrapped my fingers around the white ceramic mug in the cold air. I felt the warmth on my hands. The caramel colored surface released snakes of steam. I brought the cup to my lips and took a slow sip of the coffee bean flavored water inside.

Happiness is a hot cup of coffee in a ceramic mug on a cold day.

Continue reading...

Menu Instructions

Congrats on landing a job at Big O's Kitchen!

Our menu has 7 dishes.

Below are the instructions for making each dish.

Continue reading...

by Breck Yunits

The girl lost the race.

"I want to be fast", she said.

"You are fast", said the man.

"No. I want to be the fastest."

Continue reading...

April 5, 2024 β€” Have you ever examined the correlation between your writing behavior and sleep?

I've written some things in my life that make me cringe. I might cringe because I see some past writing was naive, mistaken, locked-in, overconfident, unkind, insensitive, aggressive, or grandiose.

I now have a pretty big dataset to identify my secret trick to write more cringe: less sleep.

For this post I combined 2,500 nights of sleep data with 58 blog posts. A 7 year experiment to see how sleep affects my writing.

Interactive version.

Continue reading...

S = side length of box. P = pattern. t = time. V = voxel side length.

March 30, 2024 β€” Given a box with side S, over a certain timespan t, with minimum voxel resolution V, how many unique concepts C are needed to describe all the patterns (repeated phenomena) P that occur in the box?

Continue reading...

February 21, 2024 β€” Everyone wants Optimal Answers to their Questions. What is an Optimal Answer? An Optimal Answer is an Answer that uses all relevant Cells in a Knowledge Base. Once you have the relevant Cells there are reductions, transformations, and visualizations to do, but the difficulty in generating Optimal Answers is dominated by the challenge of assembling data into a Knowledge Base and making relevant Cells easily findable.

Activated Cells in a Knowledge Base.

Continue reading...

January 4, 2024 β€” You can easily imagine inventions that humans have never built before. How does one filter which of these inventions are practical?

Continue reading...

September 1, 2022 β€” There's a trend where people are publishing real data first, and then insights. Here is my data from angel investing:

Sigh. I am sharing my data as a png. We need a beautiful plain text spreadsheet language.

Continue reading...

A Small Open Source Success Story

Adding 3 missing characters made code run 20x faster.

Map chart slowdown

June 9, 2022 β€” "Your maps are slow".

In the fall of 2020 users started reporting that our map charts were now slow. A lot of people used our maps, so this was a problem we wanted to fix.

Suddenly these charts were taking a long time to render.

k-means was the culprit

To color our maps an engineer on our team utilized a very effective technique called k-means clustering, which would identify optimal clusters and assign a color to each. But recently our charts were using record amounts of data and k-means was getting slow.

Using Chrome DevTools I was able to quickly determine the k-means function was causing the slowdown.

Continue reading...

by Breck Yunits

Writing this post with narrow columns in "Distraction Free Mode" on Sublime Text on my desktop in Honolulu.

October 15, 2021 β€” I constantly seek ways to improve my writing.

I want my writing to be meaningful, clear, memorable, and short.

And I want to write faster.

This takes practice and there aren't a lot of shortcuts.

But I did find one shortcut this year:

Set a thin column width in your editor

Mine is 36 characters (your ideal width may be different).

Beyond that my editor wraps lines.

This simple mechanic has perhaps doubled my writing speed and quality.

Continue reading...

May 6, 2021 β€” I split advice into two categories:

  1. πŸ₯  WeakAdvice
  2. πŸ’ŽπŸ“ŠπŸ§ͺ StrongAdvice.

Examples

WeakAdvice:

πŸ₯  Reading is to the mind what exercise is to the body.
πŸ₯  Talking to users is the most important thing a startup can do.

StrongAdvice:

πŸ’ŽπŸ“ŠπŸ§ͺ In my whole life, I have known no wise people (over a broad subject matter area) who didn't read all the time – none, zero. Charlie Munger
πŸ’ŽπŸ“ŠπŸ§ͺ I don't know of a single case of a startup that felt they spent too much time talking to users. Jessica Livingston
Continue reading...

April 26, 2021 β€” I invented a new word: Logeracy[1]. I define it as the ability to think in logarithms. It mirrors the word literacy.

Someone literate is fluent with reading and writing. Someone logerate is fluent with orders of magnitudes and the ubiquitous mathematical functions that dominate our universe.

Someone literate can take an idea and break it down into the correct symbols and words, someone logerate can take an idea and break it down into the correct classes and orders of magnitude.

Someone literate is fluent with terms like verb and noun and adjective. Someone logerate is fluent with terms like exponent and power law and base and factorial and black swan.

Continue reading...

March 2, 2020 β€” A paradigm change is coming to medical records. In this post I do some back-of-the-envelope math to explore the changes ahead, both qualitative and quantitative. I also attempt to answer the question no one is asking: in the future will someone's medical record stretch to the moon?

Continue reading...

How Old Are These Keys? Five Eras of Human Progress

My keyboard, if you removed the symbols from the typewriter and computer eras. Try it yourself.

February 25, 2020 β€” One of the questions I often come back to is this: how much of our collective wealth is inherited by our generation versus created by our generation?

I realized that the keys on the keyboard in front of me might make a good dataset to attack that problem. So I built a small interactive experiment to explore the history of the keys on my keyboard.

Continue reading...

January 29, 2020 β€” In this long post I'm going to do a stupid thing and see what happens. Specifically I'm going to create 6.5 million files in a single folder and try to use Git and Sublime and other tools with that folder. All to explore this new thing I'm working on.

TreeBase is a new system I am working on for long-term, strongly-typed collaborative knowledge bases. The design of TreeBase is dumb. It's just a folder with a bunch of files encoded with Tree Notation. A row in a normal SQL table in TreeBase is roughly equivalent to a file. The filenames serve as IDs. Instead of each using an optimized binary storage format it just uses plain text like UTF-8. Field names are stored alongside the values in every file. Instead of starting with a schema you can just start adding files and evolve your schema and types as you go.

Continue reading...

January 23, 2020 β€” People make biased claims all the time. A decent response used to be "citation needed". But we should demand more. Anytime someone makes a claim that seems biased, call them out with: Dataset needed.

Whether it's an academic paper, news article, blog post, tweet, comment or ad, linking to analyses is not enough. If someone stops at that, demand a link to a clean dataset supporting the author's position. If they can't deliver, they should retract.

Continue reading...

January 16, 2020 β€” I often rail against narratives. Stories always oversimplify things, have hindsight bias, and often mislead.

I spend a lot of time inventing tools for making data derived thinking as effortless as narrative thinking (so far, mostly in vain).

And yet, as much as I rail on stories, I have to admit: stories work.

I read an article that put it more succinctly:

Why storytelling? Simple: nothing else works.
Continue reading...

January 3, 2020 β€” Speling errors and errors grammar are nearly extinct in published content. Data errors, however, are prolific.

Continue reading...

The Attempt to Capture Truth

August 19, 2019 β€” Back in the 2000's Nassim Taleb's books set me on a new path in search of truth. One truth I became convinced of is that most stories are false due to oversimplification. I largely stopped writing over the years because I didn't want to contribute more false stories, and instead I've been searching for and building new forms of communication and ways of representing data that hopefully can get us closer to truth.

Continue reading...

July 18, 2019 β€” In 2013 I sent a brief email to 25 programmers whose programs I admired.

"Would you be willing to share the # of hours you have spent practicing programming? Back of the envelope numbers are fine!"

Some emails bounced back.

Some went unanswered.

But five coders wrote back.

This turned out to be a tiny study, but given the great code these folks have written, I think the results are interesting--and a testament to practice!

Name GitHub Hours YearOfEstimate BornIn
Donald Knuth 56000 2013 1938
Rob Pike robpike 30000 2013 1956
Peter Norvig norvig 30000 2013 1956
Stephen Wolfram 50000 2013 1959
Lars Bak larsbak 30000 2013 1965
Continue reading...

June 23, 2017 β€” I just pushed a project I've been working on called Ohayo.

You can also view it on GitHub: https://github.com/treenotation/ohayo

I wanted to try and make a fast, visual app for doing data science. I can't quite recommend it yet, but I think it might get there. If you are interested you can try it now.

Continue reading...

A Suggestion for a Simple Notation

September 24, 2013 β€” What if instead of talking about Big Data, we talked about 12 Data, 13 Data, 14 Data, 15 Data, et cetera? The # refers to the number of zeroes we are dealing with.

You can then easily differentiate problems. Some companies are dealing with 12 Data, some companies are dealing with 15 Data. No company is yet dealing with 19 Data. Big Data starts at 12 Data, and maybe over time you could say Big Data starts at 13 Data, et cetera.

Continue reading...

March 30, 2013 β€” Why does it take 10,000 hours to become a master of something, and not 1,000 hours or 100,000 hours?

The answer is simple. Once you've spent 10,000 hours practicing something, no one can crush you like a bug.

The figure on top has 10,000 hours of experience and crushes people with 100 hours or 1,000 hours like a bug. But they cannot crush another person with 10,000 hours.

Continue reading...

December 23, 2012 β€” If you are poor, your money could be safer under the mattress than in the bank:

The Great Bank Robbery dwarfs all normal burglaries by almost 10x. In the Great Bank Robbery, the banks are slowly, silently, automatically taking from the poor.

One simple law could change this:

What if it were illegal for banks to automatically deduct money from someone's account?

If a bank wants to charge someone a fee, that's fine, just require they send that someone a bill first.

What would happen to the statistic above, if instead of silently and automatically taking money from people's accounts, banks had to work for it?

Continue reading...

August 11, 2010 β€” I've had some free time the past two weeks to work on a few random ideas I've had.

They all largely involve probability/statistics and have no practical or monetary purpose. If I was a painter and not a programmer you might call them "art projects".

Continue reading...

June 15, 2010 β€” I think it's interesting to ponder the value of information over it's lifetime.

Different types of data become outdated at different rates. A street map is probably mostly relevant 10 years later, while a 10 year old weather forecast is much less valuable.

Phone numbers probably last about 5 years nowadays. Email addresses could end up lasting decades. News is often largely irrelevant after a day. For a coupon site I worked on, the average life of a coupon seemed to be about 2 weeks.

If your data has a long half life, then you have time to build it up. Wikipedia articles are still valuable years later.

What information holds value the longest? What are the "twinkies" of the data world?

Books, it seems. We don't regularly read old weather forecasts, census rolls, or newspapers, but we definitely still read great books, from Aristotle to Shakespeare to Mill.

Facts and numbers have a high churn rate, but stories and knowledge last a lot longer.

Continue reading...

March 16, 2010 β€” I wrote a simple php program called phpcodestat that computes some simple statistics for any given directory.

Continue reading...

March 8, 2010 β€” If a post on HackerNews gets more points, it gets more visits.

But how much more? That's what Murkin wanted to know.

I've submitted over 10 articles from this site to HackerNews and I pulled the data from my top 5 posts (in terms of visits referred by HackerNews) from Google Analytics.

Here's how it looks if you plot visits by karma score:

Continue reading...

July 7, 2008 β€” After months of deliberation, I’ve decided to quit my day job and work on my blog full time.

I am joking.

But these bloggers were not:

Continue reading...

May 14, 2008 β€” The other day I wrote a post on How much Gas Americans use per day. The answer is 400 Million Gallons. A reader wanted to know how much gas the whole world consumes in a day. The answer is about 83 millon bbl’s. One bbl = 42 gallons, so the world consumes about 3.5 billion gallons of gas per day. That means the United States consumes 11% of the total gas consumed per day.

Continue reading...

May 8, 2008 β€” xirium posted a tarball of all the individual profile pages for HackerNews readers(minus lurkers and those who joined after 05/07/2008). I was curious what insights, if any, could be gleamed from analyzing the data. My findings are below. I could have figured out more interesting things if I also included posts in my data, but I was looking for something simple to work on. BTW, to get the data into a table I wrote a simple python script to parse the html files. The source code is at the bottom. Or you can download the resulting dataset as an excel file.

Continue reading...



Built with Scroll v161.0.0