I’ve always been a sucker for self-care. Put it in a pretty package and I’ll most likely buy it. Add the words balm or dewy, the chances of me buying it go up by 100%. A problem? Meh. A hobby? Maybe, ha! As a teenager, I was guilty of using the “favorites” of the early 2000s; mainly, St. Yves Facial Srub, aka Satan Itself and any acne products by Neutrogena. Iconic; for the wrong reasons.

Source: Twitter.

Fast-forward to 2016 when I had my first real job after grad school and I had enough money to walk into a Sephora. Everything was so new and shiny and I just had to start trying stuff. I tried SO MUCH stuff that year, didn’t even know my skin type- a whole mess. Over time, I learned my lesson and I was better about picking products that were better for my skin, but the itch to try new things never really went away.

Source: r/SkincareAddiction

The Data: So, as the Type A person that I am, I thought it would be fun to start tracking the beauty products I used. I tracked the brand, ingredients, date of purchased, my own reviews, irritation and whether the product was emptied, decluttered, returned or given to my kids or husband.

The Data.

The Project: Using pandas, nltk and sklearn, I calculated standard descriptive statistics of the categorical and numerical variables and used NLP techniques to analyze my product reviews and irritation commentary. Please, refer to this post for NLP recipes and refer to Comprehensive Guide to Text Mining Part I and II for deeper explanations of topic modeling. This post is not meant to be a tutorial, my goal was simple curiosity of my own skincare patterns; hence, refer to previous posts for tutorials.

My Hypothesis: I… I think I’ve tried around 160 products, 25 different brands, average review score will 6 and my popular ingredients are squalane and shea butter.

brands and products overview

 
#brand
len(df.Brand.value_counts())
df.Brand.value_counts()

#product and categories
len(df['Product Name'])
df['Category'].value_counts()
 

In three years, it seems I’ve tried a total of SIXTY-FOUR brands, yikes! I’ve tried over 6 products from the following brands Dermalogica, The Ordinary, Allies of Skin and Drunk Elephant. Meanwhile, I’ve only tried one product from Dr. Loretta Skincare, Sobel Skincare and Alpha-H Skincare.

Dermalogica          8
Allies of Skin       8
Kate Somerville      5
Drunk Elephant       4
Glossier             4
Fresh                4
Wishful              3

And, ehm, it seems, well, wow, that I have tried over TWO-HUNDRED products.

As far as the type of products I have used over the last three years, this was the breakdown,

So, yeah, seems I fucking love moisturizers.

ingredients

Now, let’s take a look at which ingredients I have used the most.

 
ingredients = list(df["Key Ingredient"])
ingredients = ','.join(ingredients)
ingredients =  ingredients.split(",")
ing_count = Counter(ingredients)
ing_count.most_common()
 
[(' Shea Butter', 34),
 (' Hyaluronic Acid', 30),
 (' Squalane', 14),
 (' Niacinamide', 10),
 (' Peptides', 9),
 (' jojoba oil', 8),
 (' Panthenol', 10),
 (' Retinol', 5),
 ('Glycerin', 10),
 ('AHA', 4)
...]

Over 20 of my products contain shea better and/or squalane.

I WAS CORRECT. The OG, the HG, the QUEEN; Shea Butter. Followed by squalane and hyaluronic acid. Meanwhile, my least common were lactic acid, blue tansy oil and colloidal oatmeal.

the HG products

Note: I gave each product a score between 0 and 10.

I absolutely love how easy pandas makes it to filter your data. Pandas allows us to combine multiple logical operators, e.g, see example below. This was the section I was most curious about…I try so many products, I sometimes forget what I have truly loved.

 
df[(df['Category'] == 'Moisturizer') & (df['Score (out of 10)'] >= 8)]['Product Name']

 
0                                Dermalogica Rich Cream
10    Allies of Skin Peptides & Antioxidants Firming...
13                  Avene Hydrance Rich Hydrating Cream
29                           Tatcha The Dewy Skin Cream
77                            SKINRX LAB MadeCera Cream
94         Kiehl's Since 1851\nUltra Facial Moisturizer

No surprises here. Allies of Skin is dope. Skinceuticals is incredible and worth every penny.

I then repeated a similar filter for other product categories. See below.

ingredients

I wanted come up with some type of rating/score metric. So, I decided to give a score between 0 and 10. 0 meant the product was not good for me, 10 meant it was amazing. I also wanted to get an idea of how many products I returned, emptied, etc.

Doing a similar filter as we did in the last section I was able to come up with answers to my shopping patterns.

 
df['Score (out of 10)'].mean()
df['Score (out of 10)'].value_counts()
df['Sequel?'].value_counts()
 

My average score is 5 and I’ve given most products I’ve tried a 1 lol, but it makes sense… if everything was 10s then there’d be no variety.

As far as my use patterns, this was the breakdown.

It seems I’ve decluttered most of the products I try. SO, it would seem I could practice a bit more mindful buying when it comes to skincare. Now, I will note that I don’t simply through them out, I try to give it to someone else… so, maybe not so bad?

the reviews are in

Lastly, let’s take a look at some of the unstructured data I have collected. After trying a the product for 30 days, I came back to my database and added a comment with my thoughts on the product. I was curious to see what would come up from some straight forward weighted frequencies and NMF for the products I love and the products I hate.

#key terms
vectorizer = TfidfVectorizer(analyzer='word', ngram_range=(2,3))    
tfidf = vectorizer.fit_transform(df['cleanedText'])
ranked = rank_words(terms=vectorizer.get_feature_names(), feature_matrix=tfidf)
ranked[0:20]

n_topics = 5
nmf = NMF(n_components=n_topics,random_state=0)

topics = nmf.fit_transform(tfidf)
top_n_words = 5
t_words, word_strengths = {}, {}
for t_id, t in enumerate(nmf.components_):
    t_words[t_id] = [vectorizer.get_feature_names()[i] for i in t.argsort()[:-top_n_words - 1:-1]]
    word_strengths[t_id] = t[t.argsort()[:-top_n_words - 1:-1]]
t_words

After running the code above, I was able to come up with a few topics per subject; products I love and products I hated.

Yes, this seems correct. If my face doesn’t feel like it’s covered in pound of Crisco fat, then I don’t want it 😂. What can I say… a girl loves her balmy shit.

Well, I hope this was fun to read as it was for me to create! Will I stop buying more products? Probably not. Don’t come at me.

Posted by:Aisha Pectyo

Astrophysicist turned data rockstar who speaks code and has enough yarn and mod podge to survive a zombie apocalypse.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s