How often are my Pastebin pastes read by someone else?

During last year?s Black Friday sales, I saw that Pastebin had again a deal on their pro account. Although I don’t need their other features, the access to their scraping API is tempting. However, I could resist the urge to get the account for the usual “just in case”.
With having some free time over the Christmas holidays and new year, my thoughts were again wandering to Pastebin and especially their “unique visits” counter. I was wondering whether it?s possible to see how many people will see my (by default) public pastes.

As usual, one starts with RTFM, and I found the following information in their FAQ:

How does your hits counter work?
The hits shown above each paste displays the number of unique visitors. We only count a visitor once per certain amount of time, and we try to filter out bot & scraper traffic from the hits counter. We also do not include hits which came from the RAW version of pastes, so only the hits that happened on our actual website.

(https://pastebin.com/faq#81)

The hits counter shown on their pages doesn’t (shouldn’t) show traffic from people using the API and they also try to filter out people scraping their website. So, it should only count actual people manually visiting the website or relatively stealthy, unnoticed scrapers. But how many of them are there?

Scraping the unique visits counter

To answer this question, I created a test paste and refreshed the website from time to time. I observed that the unique visits counter skyrockets in the first few seconds and minutes after the release of a paste and settles in around 50-60 views after some time.
Well, one or two test pastes are not enough for even slightly meaningful results, so a small script was born:

import requests
from bs4 import BeautifulSoup
from multiprocessing.dummy import Pool as ThreadPool



def get_x_latest_urls(x):
    print("Requesting " + str(x) + " URLs ...")
    response = requests.get("https://pastebin.com/archive", headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    urls = []
    tbl = soup.find("table", class_="maintable")
    for link in tbl.find_all('a'):
        # pastebin links have always this format (length 9): /Z8Xem7CL
        if len(link.get('href')) == 9:
            urls.append("https://pastebin.com" + link.get('href'))
    return urls[:x]

    
def get_unique_visits(url):
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    img = soup.find("img", class_="img_line t_vi")
    return {url : int(img.next_sibling)}


def get_threaded_results(urls):
    with ThreadPool(len(urls)) as p:
        return p.map(get_unique_visits, urls)



if __name__ == "__main__":
    x = 3
    urls = get_x_latest_urls(x)

    # instant; you can use the time module for delay
    print("\nInstant result ...")
    results = get_threaded_results(urls)
    print(results)

This script collects the three latest pasts from the Pastebin archives, and then checks how many views they have. This check was done in several time intervals for the same paste. So, for each paste I looked up their view counter at: instantly, after 1 minute, 5 minutes and 15 minutes. I collected this data over two weeks in December and January at different times throughout the day.

Results

The following table shows the average (and median) views shown on the counter after the given amount of time:

Views after x amount of time Instantly 1 min 5 min 15 min
Average views
5.62 36.2 55.32 58.77
Median views
5 37 54 57.5

Please keep in mind, that the way how the data was collected introduces some small errors in the view counters. For example, the time delays, especially in the ?instantly? and ?1 minute? results. However, a general trend can still be seen.?For more details, have a looks at the box plots (axis: views):

We can see, that the general “interest” in our pastes is the strongest in the first 5 minutes and falls off after that. Our public pastes on Pastebin are indeed very public and get actively watched. And that?s excluding the access through the scraping API through which most of the meaningful analysis and traffic should go.

In general, everything you put in a public paste on Pastebin will be seen by around 60 (people/scrapers) through the website, and an unknown number of scrapers through the scraping API in the first 15 minutes. And that’s only the displayed and pre-filtered view counter. So be careful on what you paste there!