WellaBeat in SQL

Note: A more in-depth EDA was done using R and can be found at WellaBeat in the R section of this site. Explanation of the data and case study can be found at the link as well.

BigQuery Connection

library(bigrquery)       # R Interface to Google BigQuery API  
library(dplyr)           # Grammar for data manipulation  
library(DBI) 
projectid <- 'prime-depot-415622'
datasetid <- 'bellabeat'
con <- dbConnect((bigquery()),
                 project = projectid,
                 dataset = datasetid,
                 use_legacy_sql = FALSE)

List Tables

  • Let’s see if city_data is the dataset we are looking for
  • Let’s list the tables in it
SELECT table_catalog, table_schema, table_name, table_type, is_insertable_into
FROM    bellabeat.INFORMATION_SCHEMA.TABLES
2 records
table_catalog table_schema table_name table_type is_insertable_into
prime-depot-415622 bellabeat daily_activity BASE TABLE YES
prime-depot-415622 bellabeat bellat_merged BASE TABLE YES

List Columns

SELECT  column_name, ordinal_position, is_nullable, data_type 
FROM   INFORMATION_SCHEMA.COLUMNS
WHERE  table_name = 'bellat_merged'
7 records
column_name ordinal_position is_nullable data_type
Id 1 YES INT64
mDate 2 YES DATE
TotalSteps 3 YES INT64
TotalDistance 4 YES FLOAT64
Calories 5 YES INT64
WeightPounds 6 YES STRING
BMI 7 YES STRING

EDA


Verify Data

  • The information provided with the data stated that it contained 30 users, let’s verify that
  • Great so we have 35 not 30
SELECT  COUNT(DISTINCT(Id))
FROM    bellat_merged
1 records
f0_
35

Weighin Entries

  • How many times did a user enter a value for either BMI or their actual weight.
  • In other words how many rows were NOT NA
  • We already know from looking at the dataset that there are 1397 rows in bellat_merged
  • You can’t see it below but we have 100/rows actual usage of the app from 35 users in 60 days
SELECT   Id, mDate
FROM     bellat_merged
WHERE    BMI <> "NA" AND WeightPounds <> "NA"
Displaying records 1 - 10
Id mDate
NA 2016-04-15
NA 2016-04-23
NA 2016-04-24
NA 2016-05-02
NA 2016-05-06
NA 2016-05-09
NA 2016-04-17
NA 2016-04-18
NA 2016-04-06
NA 2016-04-21

Unique Users

  • Great so we have 100 actual usage of the app from 35 users in 60 days
  • How many of those users are repeat users, in other words how many unique users do we have
  • So we have 13 unique users - only showed 10 because the rest were all a one time login
SELECT     Id, 
           COUNT(*) AS number_of_weighins
FROM       bellat_merged
WHERE      BMI <> "NA" AND  WeightPounds <> "NA"
GROUP BY   Id
ORDER BY   number_of_weighins
Displaying records 1 - 10
Id number_of_weighins
NA 1
NA 1
NA 1
NA 1
NA 1
NA 1
NA 2
1927972279 2
1503960366 3
NA 4

Multiple Weighins

Let’s find out how many of the 13 unique users that logged in 100 times actually had multiple logins

  • Start from the subquery within: Select the Id column, and then COUNT how many times each ID which is GROUPED BY has loggedin their weight.
  • We can find that out by filtering the dataset with the WHERE clause that only looks at when values for BMI AND WeightPounds were entered and NOT NULL or =”NA”
  • Then we  group each entry by ID and count them and
  • Name the subquery AS subquery
  • Now let’s step back out to the outer query and SELECT what we just did in the  subquery and pull the ID as: subquery.Id and
  • Assign the count of the number_of_weighins to multiple_logins because we will use a
  • WHERE clause to just display the times a user logged in multiple times to enter their weight
  • That value comes up to 7
  • Breakdown of the multiple logins are as folllows:
SELECT
        subquery.Id,
        subquery.number_of_weighins AS multiple_logins
FROM
        (
        SELECT
                Id,
                COUNT(*) AS number_of_weighins
        FROM
                bellat_merged
        WHERE
                BMI <> "NA" AND WeightPounds <> "NA"
        GROUP BY
                Id
        ) AS subquery
WHERE
        subquery.number_of_weighins > 1
ORDER BY
        number_of_weighins DESC
7 records
Id multiple_logins
NA 44
NA 33
NA 6
NA 4
1503960366 3
NA 2
1927972279 2

Observations

  • Out of the 13 unique users that logged in their weight
    • 6 had one entry
    • 2 had 2 entries
    • 1 user had 3 entries
    • 1 user had 4 entries
    • 1 user had 6 entries
    • 1 had 33
    • 1 had 44 entries.

Remember the data had a total of 62 days from 3/12/2016 to 5/12/2016.

  • It appears that it is not necessary for the user to enter their weight for the app to make its calculations
  • Out of the 30 users provided we find 35 unique IDs
  • One user entered their weight 44 times in 62 days.
  • 13 out of 35 unique users used the manual weigh-in feature that’s 37% which is significant over a 62 day perio
  • Data time frame and sample size is small so the validity of my findings are highly unreliable

Recommendations


  • UX should be the number one priority for WellaBeat. The app needs to be redesigned to include the ability to track users use of the app.

  • WellaBeat’s mission is to help women achieve a healthy lifestyle, is that actually happening? Have we asked the users? Have we collected data to backup our claims?

  • Aside from looking within and improving our existing offerings, my main recommendation is to introduce a scale that syncs with the existing app if product design, supply, and maintenance is not a deterrent.

  • The other option would be to introduce an additional service via the existing app that can link with smart scales via bluetooth and/or wifi.

  • I’d imagine tracking the weight for pre/post and during pregnancy is extremely important. It’s obvious from the limited data provided that weigh-ins were extremely popular and effective in the marketplace.

  • Why did WellaBeat stop using Spring the fluid intake tracking product? Hydration is one of the most important aspects of health and fitness – even more so during or post pregnancy? Was that a product/supply issue? That should be a focus.

  • App should track which page is viewed, how often each page is viewed, time of day viewed. After that the data will give us more insightful information.

  • Age, gender, pregnancy status needs to be tracked either via a questionnaire/survey, account/profile creation.

  • My suggestions are easy to implement and should provide more insight as to the direction marketing should proceed in the near future.