How can customer reviews inform your marketing copy?

Using python, beautiful soup and topic modeling to understand the main themes underlying customer reviews

Christina Stejskalova
DataDrivenInvestor

--

Photo by Christina @ wocintechchat.com on Unsplash

A key skill required in data science jobs is the ability connect data insights to business outcomes. For DTC brands, one of the biggest opportunities to do this is through customer reviews.

As an interviewee, you don’t have access to Google analytics or internal data storage systems, but you can find out what customers think about brands from their online reviews. And let’s face it, what better way to highlight your skills to a company than to come prepared to an interview with insights on their own customer reviews?

Using beautiful soup to scrape customer reviews available on web sites

For this specific analysis, we are going to be using the DTC brand Ergatta. Ergatta is an innovator in the at home fitness space, taking a new spin by focussing on the competitive element of at home sport.

As always, we start by importing the relevant packages, and then make the request:

# Install the necessary packages
# Get request package
import urllib.request
import requests
# Get rid of HTML tags
from bs4 import BeautifulSoup
import html5lib
import matplotlib
import pandas as pd
# Make the request, and also make sure you get the response as text, otherwise its emptyhtml = requests.get("https://ergatta.com/product/the-ergatta-rower/").text# Format HTML in an easier to use way using beautiful soup
soup = BeautifulSoup(html, 'html5lib')

Armed with the website data, we then scan the HTML to find the relevant review data. In this specific example, all ratings are available in the div called “jdgm-rev__header”. The best way to find this tag is to use the inspect option available on right click on any page, then if you select element, you can hover over the page until you find the element that houses all your data.

# Rating values and names
ratings = soup('div','jdgm-rev__header')
scores = []
names = []
for i in ratings:
# get score
score = i('span')[0].get('data-score')
# get name
name = i('span')[3].get('data-fullname')
scores.append(score)
names.append(names)
review_text = [p.text for div in soup('div','jdgm-rev__body')
for p in div('p') ]
review_title = [b.text for b in soup('b','jdgm-rev__title')]# create an Empty DataFrame object
df = pd.DataFrame()
# Update values of reviews
df['scores'] = scores
df['review_text'] = review_text
df['review_title'] = review_title

Running this code gives us the following output:

As we can see from the above output, all 5 reviews have 5 stars. Cool, customers obviously love Ergatta, we don’t need advanced modeling to gage that. That being said, what exactly do they like most? How about we understand the topics and themes customers are talking about in these reviews? We can use the review text to understand the topics customers find most exciting:

# Analyze reviewsimport sklearn.feature_extraction.text as text
import numpy as np
# This step performs the vectorization,
# tf-idf, stop word extraction, and normalization.
# It assumes docs is a Python list,
#with reviews as its elements.
cv = text.TfidfVectorizer(review_text, stop_words='english')
doc_term_matrix = cv.fit_transform(review_text)
# The tokens can be extracted as:
vocab = cv.get_feature_names()
# Next we perform the NMF with 1 topic
from sklearn import decomposition
num_topics = 1
#doctopic is the W matrix
decomp = decomposition.NMF(n_components = num_topics, init = 'nndsvd')
doctopic = decomp.fit_transform(doc_term_matrix)
# Now, we loop through each row of the T matrix
# i.e. each topic,
# and collect the top 3 words from each topic.
n_top_words = 3
topic_words = []
for topic in decomp.components_:
idx = np.argsort(topic)[::-1][0:n_top_words]
topic_words.append([vocab[i] for i in idx])
# Themes of the reviews
topic_words

We only have 5 reviews on the front page, that’s not a ton of data so we reduce the number of top words to 3. Doing that, our output gives the following 3 words:

Output of Topic Modeling

How do we apply these results to actionable business outcomes?

Ok, so we have these 3 topics, what can we do with these results? Well, the 3 topics tell you the themes customers are talking about in their reviews. There’s a good chance that if customers are talking about these things, that’s what you should also talk about in your website and ads.

For the website, that would be the equivalent of using the word machine instead of device for example.

Another clear message is emphasizing the important of the workouts. Perhaps you could design an ad that highlights your workouts. I made a mock-up here:

Image by author (No wonder I am not a designer!)

Wait a minute, but we only have 5 customer reviews, can we really say so much with so little?

As a data scientist, I am fully aware that 5 reviews is nothing to base a full analysis on. For this analysis specifically, I was limited by the number of reviews available on the main page (5) and because its a React component rather than a paginated one, I wasn’t able to set up the scraper to run through all reviews. As a result, I would encourage readers instead to use this as a sample approach, and perhaps instead of using a web scraper to get website reviews, we go directly to the Google app store and get more reviews.

Until next time.

--

--

My articles vary in topic but focus on how you can build products that have impact with the power of psychology and data