Basics of Open Refine for SEO

Basics of Open Refine for SEO

While doing keyword analysis, almost every time you face these two challenges:

  • duplicate keywords – this can be a problem especially if you combine multiple sources of keywords, but some tools give you duplicate keywords by themselves.
  • multiple forms of keywords – keywords that are almost duplicate – typically keywords with or without diacritics, singular/plural, different order of words in search phrase, etc.

Our goal today will be to remove duplicate keywords and merge multiple different forms into one (while retaining values under the main key phrase).

We are going to use Open Refine – this is a strong tool, which you absolutely should get a handle on, so we are going to prefer it over Excel and other tools.

At the end of the article you can find step by step instructions, mainly for those who use this guide over and over again.

Install Open Refine (free software)

All you need to do is to download an installation file based on your platform. The installation is usually very easy. Open Refine runs on your PC and you can interact with it over your browser – after you install and start the program point your browser to and you should see its interface.

Creating your first project and data import

To work with your data you need to create your first project and import your keywords. For the purpose of this article we are going to use some sample data. We are going to use a text file that contains one keyword and information of its search volume on each line. In real life you would process much larger data, but it suits our purpose.

As an example we are going to use a topic called “Champs-Élysées” – the name of a beautiful avenue in Paris. You can take a look at our sample data (or download it right away).

Step #1: Click on “Create project”. You can import your own data from a local file, URL address, clipboard, etc. We are going to choose the local file (“This computer”).

Step #2: You’ll see a preview of your imported data. Make sure that everything looks fine. Play with the additional setting if necessary. Then you can name your project and click on “Create Project”.

Step #3: Before we can tackle the duplicate rows, we have to sort our data alphabetically by a keyword. Just click on the column menu (the blue triangle next to the column caption), then click “Sort…” and then select sorting as TEXT from A to Z.

You also need to make this sorting permanent – it wouldn’t work without it. After sorting you can see a new option “Sort” above the table with your keywords. Select “Reorder rows permanently”.

Step #4: Now we can start with removing the duplicate rows. In the Keywords column menu click on “Edit cells -> Blank down”. That deletes all the duplicate keywords from our dataset. But now there are a few rows with blank keywords We need to remove the rows as well.

Step #5: We are going to find the rows by selecting “Facet -> Text facet” in the Keywords’ column menu.

On the left side you’ll see a sidebar with a box labelled “Facet / Filter”. Select “(blank)”. This way you filtered only the blank keywords. Now select ’All -> Edit rows -> Remove all matching rows’ in the first column menu.

Great! All the duplicate rows are gone. You are right, in Google Spreadsheets or LibreOffice it would be a bit easier, but it is super-fast when you get used to it.

Merge different variant of your keywords

Step #6: At first you need to close the Keywords facet box, so you can see all the keywords in your list. After that, in the Keyword column menu select “Edit cells -> Cluster and Edit”.

A new window opens up – please see screenshot below. Default settings “Method: key collision” and “Keying Function: fingerprint” are usually fine but you should play with different settings and run this process several times to get the best results.

Step #7: You can see the keywords arranged into groups (clusters) based on similarity. Now let’s start with merging. All you need to do is to select the primary form of keyword (in column “Values in Cluster”) – choose the one with right diacritics, preferred word order, etc. You can also set your own value (“New Cell Value”) or preview the selected cluster (“Browse this cluster”) to check other data connected with these keywords.

After going through all the keywords, select “Merge Selected & Re-Cluster”. After that you can tweak the settings a little bit and see if it shows some new clusters that need to be merged.

OK, now we have only one keyword in each group (cluster) – but that also means that we have a lot of duplicate rows again. We can’t use the method described in the first part of this article, because we don’t want to just delete them. We need to count total search volume of the whole group and add it to the primary keyword.

Step #8: You need to sort the keywords alphabetically again (click on “Sort” in Keywords column menu) and then make it permanent (“Sort -> Reorder rows permanently”). In the Keywords column menu choose “Edit cells -> Blank down”. You’ll see something similar to the image below.

Before we can remove the rows with blank keywords, we need to add up the search volume to the right keyword. To accomplish that we will now work with Search volume column menu – choose “Edit cells -> Join multi-valued cells”. You’ll see a dialog box with prefilled coma in a text box. Leave it there and hit OK.

At this time you don’t have any duplicates and you can see search volumes joined in some rows (i.e. ’59, 68, 13′). To sum those up, click in Search volume menu on ‘Edit cells -> Transform’.

Enter this expression:


In the preview you can see if everything is as expected.

Done! In the top right corner you’ll find an option to export your final data to Excel / CVS / Google Spreadsheets etc.

Quick summary:

Duplicate rows:

  1. Keywords column menu -> Sort -> (text, A-Z)
  2. Sort -> Reorder rows permanently
  3. Keywords column menu -> Edit cells -> Blank down
  4. Keywords column menu -> Facet -> Text facet
  5. Choose (blank) in the facets box on the left hand side
  6. First column menu -> Edit rows -> Remove all matching rows

Merge different variants of keywords:

  1. Keywords column menu -> Edit cells -> Cluster and Edit
  2. (choose all the groups you want to merge) -> Merge Selected & Re-Cluster
  3. Keywords column menu -> Sort -> (text, A-Z)
  4. Sort -> Reorder rows permanently
  5. Keywords column menu -> Edit cells -> Blank down
  6. Search volume column menu -> Edit cells -> Join multivalued cells -> leave default (“,”)
  7. Search volume column menu -> Edit cells -> Transform
  8. Enter expression: forEach(value.split(','),v,v.toNumber()).sum()
  9. OK