Data Collection, Text Analysis, and the Benefits of Peer Review Pt. 1

Perhaps the most difficult step in my project was figuring out just how to collect a large bundle of comments from Instagram. Through my research, I learned that Social Media researchers preferred method of data collection involved using AI Assisted “Scrapers; “Data Scraping” is essentially a computer program that extracts data from a website and outputs it as human-readable data that ideally will be compatible with other data analysis program… I then learned that, in recent years, Instagram / Facebook (now Meta) has done absolutely everything in its power to essentially resist these tools’ usability. That’s not to say that techniques do not currently exist, but what I ultimately found was that techniques would either:

  • Require a rudimentary understanding of Python that I both lacked and was unsure I could become proficient in given the course’s limited timeframe.
  • Are fairly costly or had extremely limited free trials that would limit what I could do and which could have become problematic if I didn’t have a rock solid research plan.

I attempted to use a free option, namely… but despite being relatively straightforward, I had great difficulties actually getting it to work on Instagram… I do think that this tool could be used to scrape Instagram comments… but I am just 100% unsure how to get it to do so.

During the course of researching both why Scraping Instagram had become such a difficult process and how one could work around those difficulties and still do it, I stumbled onto this interesting blogpost called “Instagram Scraping in 2021” by Vlad Mishkin (2021) that… to be quite honest, I have some difficulty fully understanding. But, with that said, I think that with time I actually might be able to follow his methods and perhaps get this particular code to work… I may still try the method he outlined in the future, but for now I am constrained by time. 

This left me with one extremely unfortunate option: I had to go in and copy / paste as many comments as I possibly could into an Excel file that could then be converted into a TXT file ready for data analysis. This process unfortunately took weeks… many late nights (that were at least soundtracked by some decent music). Given Instagram’s (seeming?) instability on web browsers, I had difficulties extracting comments using this technique as well. The first problem is that a posts’ comments must be “loaded” multiple times by pressing a button in order to gain access to the entire comment thread. This is not exactly as straightforward as it seems on the surface… sometimes, you press the button, and it simply re-loads the comments you have already loaded; other times you will press the button, and it will collapse the entire thread to its standard state. I often found that I would have to load / unload comments and close out / reopen posts multiple times in order to get the full thread in view.

from Slogans (2011-Present), 2020
Douglas Coupland

To get around the finicky nature involved with manually Scraping an entire post’s comment thread, I decided to collect the top comments from each post. This resulted in about 1,400 individual comments from around 140 posts. During my project presentation, some of my fellow classmates brought up a couple of issues worth considering:

  • How are the comments organized? 
    • This is tricky… as it seems kind of unclear. Based on my experimenting, it appears that they are organized chronologically with the more recent comments appearing at the bottom of the thread. But other research I’ve done states that comments from verified accounts or high-follower volume accounts will be prioritized over smaller accounts. This would mean that I would have collected comments from first responders, verified responders, and responders who have a lot of followers.
  • What did you do about emojis?
    • I did not collect emojis. This could be a problem because in some ways; emojis set tone. But with that said, most comments that included emojis solely included emojis… most comments with text solely included text. I do not think this decision will have an outsized impact on my study.
  • Why not look at multiple posts with a high-volume of comments?
    • In a phrase: Because that would have been annoying… But… yeah, I should do this. So I did. I collected all the comments from 10 posts with a high-volume of comments resulting in about 500 additional comments.

So now, I have a corpus of about 1,900 comments with which to perform a multitude of experiments on. I have detailed that process here.

One thought on “Data Collection, Text Analysis, and the Benefits of Peer Review Pt. 1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s