Dear Analyst

Dear Analyst


Dear Analyst #52: Extracting text from the middle of a cell using World Happiness data

December 07, 2020

Sometimes you have a long list of data in a column and you want to pull out a specific string of characters in the cell and put the output in another column. I've done this countless times and was reminded of this task after watching this meetup on Power Query vs. Excel formulas from Danielle Stein Fairhurst (I talked at length about Power Query vs. formulas in the previous episode about custom data types). This episode dives into the nested formula and other solutions to extract text from the middle of a cell. You can copy the Google Sheets from this episode or watch the tutorial below.

https://www.youtube.com/watch?v=Scrjsfxfui4

World Happiness Report data

Before I get into the solution, wanted to talk a bit about the data set used in this episode: The World Happiness Report.

Source: Action For Happiness

This report aims to measure the state of happiness around the world. The first report was published in 2012. Five years later, the report ranked 155 countries by their happiness levels and was released at an event celebrating the International Day of Happiness (March 20th) at the United Nations.

What I find interesting about the actual poll is that the "main life evaluation" question asked to respondents uses a mechanism called the Cantril Ladder:

Please imagine a ladder with steps numbered from zero at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time? (ladder-present) On which step do you think you will stand about five years from now? (ladder-future)

I think it's a report with an audacious goal of quantifying and standardizing happiness levels around the world. The Sustainable Development Solutions Network has published the raw CSV data from 2015-2019 on Kaggle.

The Problem

I edited the data set from 2019 a little to demonstrate the problem we need to solve. In the screenshot below, you'll notice in the Data column the Overall Rank, Country, and Score values are all combined into one value:

Our goal is to extract "Finland" from cell B2, "Denmark" from cell B3, etc. We want to put the country into an empty column, and then we can hide the Data column or completely delete it assuming we copy and pasted the values in our final output column.

Detecting the pattern

You'll notice the pattern with how the data is formatted. The country is contained within opening and closing parentheses. The rank of the country precedes the country, and the overall happiness score is at the end of the value preceded by a dash. We can combine the MID() and FIND() functions to get just the country in between the two parentheses.

Using MID() and FIND() to extract text from a cell

I created a new column next to column B called Extract Country and put this formula in this column to extract the country:

=mid(B2,find("(",B2)+1,find(")",B2)-find("(",