library(bigrquery) # R Interface to Google BigQuery API
library(dplyr) # Grammar for data manipulation
library(DBI)
<- 'prime-depot-415622'
projectid <- 'bellabeat'
datasetid <- dbConnect((bigquery()),
con project = projectid,
dataset = datasetid,
use_legacy_sql = FALSE)
WellaBeat in SQL
Note: A more in-depth EDA was done using R and can be found at WellaBeat in the R section of this site. Explanation of the data and case study can be found at the link as well.
BigQuery Connection
List Tables
- Let’s see if city_data is the dataset we are looking for
- Let’s list the tables in it
SELECT table_catalog, table_schema, table_name, table_type, is_insertable_into
FROM bellabeat.INFORMATION_SCHEMA.TABLES
table_catalog | table_schema | table_name | table_type | is_insertable_into |
---|---|---|---|---|
prime-depot-415622 | bellabeat | daily_activity | BASE TABLE | YES |
prime-depot-415622 | bellabeat | bellat_merged | BASE TABLE | YES |
List Columns
SELECT column_name, ordinal_position, is_nullable, data_type
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'bellat_merged'
column_name | ordinal_position | is_nullable | data_type |
---|---|---|---|
Id | 1 | YES | INT64 |
mDate | 2 | YES | DATE |
TotalSteps | 3 | YES | INT64 |
TotalDistance | 4 | YES | FLOAT64 |
Calories | 5 | YES | INT64 |
WeightPounds | 6 | YES | STRING |
BMI | 7 | YES | STRING |
EDA
Verify Data
- The information provided with the data stated that it contained 30 users, let’s verify that
- Great so we have 35 not 30
SELECT COUNT(DISTINCT(Id))
FROM bellat_merged
f0_ |
---|
35 |
Weighin Entries
- How many times did a user enter a value for either BMI or their actual weight.
- In other words how many rows were NOT NA
- We already know from looking at the dataset that there are 1397 rows in bellat_merged
- You can’t see it below but we have 100/rows actual usage of the app from 35 users in 60 days
SELECT Id, mDate
FROM bellat_merged
WHERE BMI <> "NA" AND WeightPounds <> "NA"
Id | mDate |
---|---|
NA | 2016-04-15 |
NA | 2016-04-23 |
NA | 2016-04-24 |
NA | 2016-05-02 |
NA | 2016-05-06 |
NA | 2016-05-09 |
NA | 2016-04-17 |
NA | 2016-04-18 |
NA | 2016-04-06 |
NA | 2016-04-21 |
Unique Users
- Great so we have 100 actual usage of the app from 35 users in 60 days
- How many of those users are repeat users, in other words how many unique users do we have
- So we have 13 unique users - only showed 10 because the rest were all a one time login
SELECT Id,
COUNT(*) AS number_of_weighins
FROM bellat_merged
WHERE BMI <> "NA" AND WeightPounds <> "NA"
GROUP BY Id
ORDER BY number_of_weighins
Id | number_of_weighins |
---|---|
NA | 1 |
NA | 1 |
NA | 1 |
NA | 1 |
NA | 1 |
NA | 1 |
NA | 2 |
1927972279 | 2 |
1503960366 | 3 |
NA | 4 |
Multiple Weighins
Let’s find out how many of the 13 unique users that logged in 100 times actually had multiple logins
- Start from the subquery within: Select the Id column, and then COUNT how many times each ID which is GROUPED BY has loggedin their weight.
- We can find that out by filtering the dataset with the WHERE clause that only looks at when values for BMI AND WeightPounds were entered and NOT NULL or =”NA”
- Then we group each entry by ID and count them and
- Name the subquery AS subquery
- Now let’s step back out to the outer query and SELECT what we just did in the subquery and pull the ID as:
subquery.Id
and - Assign the count of the
number_of_weighins
tomultiple_logins
because we will use a - WHERE clause to just display the times a user logged in multiple times to enter their weight
- That value comes up to 7
- Breakdown of the multiple logins are as folllows:
SELECT
Id,
subquery.AS multiple_logins
subquery.number_of_weighins FROM
(SELECT
Id,
COUNT(*) AS number_of_weighins
FROM
bellat_mergedWHERE
<> "NA" AND WeightPounds <> "NA"
BMI GROUP BY
Id
AS subquery
) WHERE
> 1
subquery.number_of_weighins ORDER BY
DESC number_of_weighins
Id | multiple_logins |
---|---|
NA | 44 |
NA | 33 |
NA | 6 |
NA | 4 |
1503960366 | 3 |
NA | 2 |
1927972279 | 2 |
Observations
- Out of the 13 unique users that logged in their weight
- 6 had one entry
- 2 had 2 entries
- 1 user had 3 entries
- 1 user had 4 entries
- 1 user had 6 entries
- 1 had 33
- 1 had 44 entries.
Remember the data had a total of 62 days from 3/12/2016 to 5/12/2016.
- It appears that it is not necessary for the user to enter their weight for the app to make its calculations
- Out of the 30 users provided we find 35 unique IDs
- One user entered their weight 44 times in 62 days.
- 13 out of 35 unique users used the manual weigh-in feature that’s 37% which is significant over a 62 day perio
- Data time frame and sample size is small so the validity of my findings are highly unreliable
Recommendations
UX should be the number one priority for WellaBeat. The app needs to be redesigned to include the ability to track users use of the app.
WellaBeat’s mission is to help women achieve a healthy lifestyle, is that actually happening? Have we asked the users? Have we collected data to backup our claims?
Aside from looking within and improving our existing offerings, my main recommendation is to introduce a scale that syncs with the existing app if product design, supply, and maintenance is not a deterrent.
The other option would be to introduce an additional service via the existing app that can link with smart scales via bluetooth and/or wifi.
I’d imagine tracking the weight for pre/post and during pregnancy is extremely important. It’s obvious from the limited data provided that weigh-ins were extremely popular and effective in the marketplace.
Why did WellaBeat stop using Spring the fluid intake tracking product? Hydration is one of the most important aspects of health and fitness – even more so during or post pregnancy? Was that a product/supply issue? That should be a focus.
App should track which page is viewed, how often each page is viewed, time of day viewed. After that the data will give us more insightful information.
Age, gender, pregnancy status needs to be tracked either via a questionnaire/survey, account/profile creation.
My suggestions are easy to implement and should provide more insight as to the direction marketing should proceed in the near future.