How to do SEO for large scale user platforms | RankWatch Blog

Check out what's new and upcoming
in RankWatch

How to do SEO for large scale user platforms

by
No Comments

A couple of days ago, I listened to a marketing podcast about content and its importance for SEO nowadays. On the podcast, the conversation was mainly about how to write good content as part of successful SEO. But is content always textual?

I am responsible for SEO at Dailymotion, a user video platform. Our content consists to 99% of videos. This led me to writing about how to optimize for search engines users, when your content doesn’t consist of text.

But during the thought process, I came to the realization that this is not the point. The point is not content optimization with videos, but how to do SEO for a large scale user platform. It comes with unique challenges and I’ll tell you how to tackle them.

So what qualifies (us as) a large scale user platform?

Currently, dailymotion.com has roughly 15,000,000 URLs in the Google index.

dailymotion

We have hundreds of millions of users and are present in roughly 40 countries. This puts us in one basket with Youtube, Facebook, Twitter, Instagram, Pinterest, etc. And we’re all dealing with the same challenges, as most of our content is not created by ourselves and therefore only loosely in our hands.

In this article, I’ll first show you what challenges we’re dealing with, what weapons we use to master them, what data we look at and what tools need to provide for us in order to master SEO for a large scale user platform.

SEO challenges large scale user platforms face

First, I’ll show you what problems user platforms encounter and approaches to solve them. Let’s dive right in:

1. Spam

Challenge: Spam, oh my, spam… I really underestimated (or ignored?) how much spam there is in the world, before I joined Dailymotion. We’re a special target, because users can make money on our platform. Of course they try spamming the hell out of it. I distinguish between two basic forms of spam: copyright infringement and explicit content.

Spamming
Yes, what you see in the screenshot happens on a daily basis, not only on our platform.

Solution: We use blacklists to noindex spam right from the get-go. It’s an efficient self-learning system, that’s constantly refined by monitoring rankings for spammy terms and then adding these to our blacklists.

At the same time, we have an anti-spam team that closes spam-accounts and looks at grey area cases. For example: you’d want to ban videos containing the term “sex” in the title, but what if a partner of ours, like vanity fair, publishes a video about celebrities having sex? For such cases, you still need human judgment.

2. Duplicate / thin content

Problem: Not only online shops fight with this, also user platforms. There are multiple causes, e.g. Users creating profiles, but then deciding to never upload any content, or creating a board / playlist and never adding anything to it. This creates tons of “empty” pages that are not in favor of Google, since they provide a poor user experience.

duplicate-content

It’s natural to happen and not a sign of a bad platform, but you need to take care of it. This problem grows, as soon as you provide users the chance of reposting existing content (if that’s not already part of your core product).

Solution: As a user platform, you need to take care of at least two things: settings those pages to meta=noindex and trying to prevent them in the first place, by motivating users to provide content.

For duplicate content, a canonical tag is the better approach, so you could automatically canonicalize reposts. Unfortunately, you will always have to deal with a little overhead, though, because you cannot prevent different accounts / profiles from uploading the same content.

A more sophisticated approach is the concept of a quality score. The idea behind that concept is to assign each page a cumulative quality score, consisting of factors you define yourself and then based on that either indexing or noindexing pages (or promoting them more). Examples for such factors could be “number of videos / pictures / posts”, “amount of lifetime visits” or “number of backlinks” and user signals (bounce rate, time on site, pages per session, CTR).

3. Indexation

Problem: When you have millions of pages, indexation becomes a problem. Thousands of videos are being uploaded on our site every day and of course we want them to be indexed as fast as possible.

Indexation
We know that Google dedicates crawl resources according to the PageRank of a site. At the same time I believe that Google (manually) assigns crawl resources and I don’t seem to be alone with this thought. This is just my personal opinion. However, as an SEO you want your dedicated crawl resources to be used as optimal as possible.

Solution: There are many ways to optimize how Google crawls your site. I’ll not go into too many details here, but want to make sure to bring each across:

  • Site hierarchy: By making sure that each URL can be reached within the least amount of clicks from the home page, you allow search engines to crawl very efficiently. Of course, that’s an ideal state, but we strive for our main content to be no deeper than 3 levels.
  • Sitemaps (XML + HTML): Providing XML sitemaps that are a) as fresh as possible, b) covering your most important pages, c) are categorized and d) as rich as possible is crucial for large websites. Depending on your content, you’d also want to upload image and video sitemaps.
  • At the same time, an HTML sitemap helps crawlers to find valuable high-level content right from the start.
  • Robots.txt: Controlling what is being crawled is crucial not only in order for search engines to find all content, but also for them to not overload your server capacity. You should use the robots.txt to exclude irrelevant parts of your website from being crawled. At the same time, you should analyze where search engine crawlers get stuck (redirect chains & loops) and make sure to exclude those as well.
  • Internal linking: Internal linking is a powerful way to steer the indexation and link juice flow throughout your site. In an ideal scenario, your most important pages – I call them money pages – receive more link juice (PageRank) than they give away (CheiRank). This is a very simplified model that I’m happy to elaborate on, but for now we’ll stick to that.
    PageRank
    Don’t forget about the implications of site-wide implemented internal linking modules, such as paginations and breadcrumb navigations. Each come with their own potential for optimization. In the end it’s important to test and render scenarios, as good as you can.
  • Backlinks: Of course backlinks also matter in the grand scheme of things. The better your backlink profile, the more authority you have (rankings are the result of relevancy and authority). Whatever effort you put into actively shaping your backlink profile, the link juice is efficiently distributed throughout your site.
    *I don’t want to engage anyone to buy backlinks.

4. Generic keyword targeting

Problem: As a user driven platform, the user decides the topics and ultimately the keywords that are being targeted. Maybe a user creates a pin board or (video) playlist that targets the generic topic “hip hop”. But you cannot optimize that page and it probably won’t satisfy the user intent when searching for “hip hop”. From an SEO standpoint of view that’s rather sub-optimal: how would you target generic keywords, e.g. SEO firepower?

Solution: The solution to that is structuring your site with category pages. On such a page you cluster content for a certain topic, such as Quora does very well with “Crowdfunding”: https://www.quora.com/topic/Crowdfunding.

Quora
Quora does a great job, because not only does it aggregate user content on these pages to make and keep them relevant, they also enrich those topic pages by linking to related topics, providing a definition, FAQ and most viewed contributors. Of course it has a high relevance in Google’s eyes for the targeted topic!

Use category pages to structure your content, target generic keywords / topics and enrich them with as much useful information as possible. When it comes to identifying what topics to build pages for, of course you’d start by doing your keyword research and see what would fit to the user intent behind the topic pages.

There is another perspective that I want to point out: as a large scale user platform, you are a search engine yourself! See what people look for on your site by analyzing your search function and start building the according pages.

Are there other levers? Of course! Page speed, meta data patterns and mobile optimization are just three examples of a whole plethora of weapons you have to optimize on a large scale, but I feel like the big four I mentioned are especially important to user platforms.

Which data should you look at?

Log files. What you really need to understand is a) how much of your site is being crawled by Google and b) how often. You get an idea about crawl rate fluctuations from the Google Webmaster Tools (crawl stats), but if you really want to figure out what’s going on you need to analyze your server log files.

Log-files

Figure out which pages are not being crawled and why. Reasons could be poor internal linking, certain page types not being in the XML sitemaps, poor site hierarchy, meta=nofollow / disclosed in the robots.txt or redirect chains and loops.
At the same time, you want to keep an eye on where on your site Google invests a lot of crawl resources. If those are not your “money maker pages”, there is a problem.

Pages without lifetime visits or in the last 30/90 days. You should constantly monitor pages that receive no traffic in their lifetime (or within the last 30/90 days), otherwise your platform will soon be loaded with thin content. Causes could be “empty” pages, such as search pages without results or profile / group / board pages without content, spam or orphan pages (not internally linked on the site).

I cannot tell you what KPI to focus on. That depends on business goals and the individual business itself. But organic traffic should always be in the mix of metrics you regularly monitor. It’s the essence of SEO. On top of that you should have some sort of rank tracking in place, even you cannot track every ranking of millions of pages. At least your most valuable pages, generic keywords and brand keywords should be tracked weekly.

On top of that, look at internal user signals (bounce rate, session time, pages per session) and external user signals (Click Through Rate). If you already assign a quality score to your page, as discussed before, even better. What I learned is that when a quality factor goes up or down for let’s say a page type, traffic soon follows.

Segment all these metrics for countries, devices and page types and you will have a full view on what’s going on on your platform. You can then act accordingly.

What do tools need to provide?

Any SEO for a larger website will have to deal with a stack of of tools. Even there are very good platforms out there that cover a lot of necessary features, it still makes sense to pair that with smaller, more specified applications. Instead of just giving you a list of tools, I rather want to list what a tool needs to provide, so you can find what works best for yourself and your company / website.

(To not overcomplicate things, I left side branches like App Store Optimization or Amazon optimization out and focused on pure search engine optimization)

First, you need a web analytics tool of course. It has to cover at least traffic types (referral, organic, paid, social, direct), traffic per URL, user signals, devices, languages and conversion tracking.

Second, you’d want to pair that with a rank tracking application that weekly monitors your most important keywords on a URL basis, and of course the ones of your competitors. If you want to take it a step further, you’d get a tool that provides you an accumulated visibility metric of rankings, search volume and keywords.

Third, you need an application that analyzes your server log files for you, unless you want to do it yourself. If you have skilled developers with enough time, you can create a dashboard yourself, otherwise there are some solutions out there. Connect log file crawl data with your web analytics and maybe rank tracking and you have very powerful intel!

Fifth, you need access to backlink database. This is something that’s hard, if not impossible, to insource. Even from a passive standpoint of view it makes sense, because you can monitor where most link juice arrives on your site, who links to you and whether you are at risk for spam.

Finally, you need an application that allows you to regularly crawl (parts of) your site. As funny as it sounds, but working for a large scale user platform means you constantly want to understand your site. To find bugs and all other sorts of problems, crawling is substantial.

I hope this article provides more clarity about the specific needs SEO entails for large scale user platforms. Of course, the topic is much more potent. Looking at the overlap with User Experience, Social Media and Product Development departments, I see tons of synergies. Depending on the resonance of this article, I’m happy to elaborate on these topics.

Kevin Indig on FacebookKevin Indig on LinkedinKevin Indig on Twitter
Kevin Indig
Kevin Indig is the director of SEO @ Dailymotion. Relocated from Germany to Silicon Valley, he worked for big name SEO consultancies, such as The Reach Group and Uniquedigital. Previous to his current position, he joined Searchmetrics to build out their US SEO consulting branch. When he is not following his passion SEO, he’s probably working out.
SUBSCRIBE TO THE RANKWATCH BLOG


 

MOST POPULAR