Dear Analyst

Dear Analyst


Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets

May 04, 2021

If your dataset doesn't have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique identifier like customer ID or transaction ID because that row of data might be used with some other dataset. It's rare these days not to have one. Here are a few methods for creating your own unique identifiers using this list of customer transaction data (Google Sheets for this episode here).

https://youtu.be/fjkO0kHbbKw

Method 1: Create a sequential list of numbers as unique IDs

Each of these transactions is from a unique customer on a unique date for a unique product. We could do something as simple as creating a sequential list of numbers to "mark" each transaction. Maybe we can prefix this new transaction ID column with "tx-" so each unique ID will look something like this:

This method involves creating a dummy column (column I) of sequential numbers. Then in column A, you write "tx-" followed by the number you created in column I, and you have a unique ID. This unique ID is only relevant for this dataset, however. If there are other tables of data related to customers and transactions, those tables won't know about this new transaction ID you just created on the fly.

Method 2: Create random numbers as unique ID

This method will make your unique IDs feel a little more "unique" since the numbers are randomized:

Notice how we happen to take the result of the RAND() function and multiply it by 100,000 to get a random number with 5 digits. Our dataset is only 1,000 rows long so the chances of duplicate values is low, but there still exists that possibility.

This is probably the least preferred solution because of the fact that there could be duplicate values (there are formula hacks to get around it). Another reason this isn't a great solution is that you have to remember to copy and paste values from the random numbers into another column. The RAND() function is a volatile function (basically changes every time you reload the Sheet) so you would lose your unique ID every time the Sheet loads. This means you have to remember to paste just the values perhaps in the next column over before referencing that value as your unique ID.

Finally, if your dataset has timestamps like this, chances are the unique IDs are meant to be sequential (using Method 1). Assigning random unique IDs to each transaction might make reconciling the data in the future more difficult.

Method 3: Concatenate (add) columns together to create unique ID

This method involves concatenating (adding) together different columns to create a unique ID. The reason I like this method is because it makes creating reports a bit easier since you can write in the values in a cell for a lookup to reference. For instance, the unique IDs in our dataset is created by combining the Customer ID, SKU_Category, and SKU columns:

We put a dash "-" in between each of the cell references so it's a bit easier to see all the different characters in this "unique ID." The issue is this: what if there are multiple transactions with the same Customer ID, SKU_Category, and SKU? We insert a COUNTIF column in between columns B and C to count ...