Data Mining with Spreadsheets

by Ronald Campbell of the Journalist at the Orange County Register

Workforce data to consider for immigrant data analysis

Generate a report using the following criteria

  • Education Attainment (sidebar)
  • Nativity: Point of origin (sidebar)
  • California (topbar)

You should now have data worth mining!

Time to interview the data!

  • Paste the generated data into your spreadsheet of choice.
  • Select the highest row with numbers and enter something like into the first empty cell to the right: =D1/B1
  • Copy/paste the formula only to the empty cells corresponding to the other rows.
  • Format the results to show percentages

Adjusting the Data

  • Under ‘Data Options’ change the year to 2007 (sidebar)

  • Formula for comparing 2012 with 2007:

    • new - old / old

What does this show?

  • Incoming workforce with high school and post-high school education dropped

  • Surprise: Incoming workforce with bachelor’s degree or higher is increasing.

    • Might be because more immigrants coming in have bachelor’s degrees.
    • Might be because more immigrants are getting bachelor’s degrees.
    • No way to figure out what this means from the data available. We need a journalist to investigate!

Foreign Born Workers

  • Choose 2012 (sidebar)
  • Table definitions are state and nativity (sidebar)
  • Refresh

Visualizing the Results

I used Many Eyes (http://bit.ly/many-eyes) to quickly visualize the data. See