Over the past decade, SEO has moved from spreadsheet-driven, anecdotal best practices to a more data-driven approach. This is evidenced by the increase in the number of his SEO experts learning Python.
As the number of Google updates increases (11 in 2023), SEO professionals are recognizing the need to take a more data-driven approach to SEO, and the internal link structure of your site architecture is no exception. there is no.
In my last article, I outlined how to make internal linking more data-driven, providing Python code on how to statistically evaluate site architecture.
Beyond Python, data science can help SEO professionals more effectively discover hidden patterns and important insights to inform search engines about the prioritization of content within a website.
Data science is the intersection of coding, mathematics, and domain knowledge, which in our case is SEO.
So while math and coding (always in Python) are important, asking the right questions of your data and having an intuitive sense of whether the numbers “look right” is crucial, which is why SEO is so important. It doesn’t become less sexual.
Adjust your site’s architecture to support underlinked content
Many sites are structured like a Christmas tree, with the home page (the most important) at the top, and subsequent levels arrange other pages in descending order of importance.
As an SEO scientist, you’ll want to know what the link distribution looks like from various perspectives. This can be visualized in several ways using the Python code from the previous article:
- Site depth.
- Content type.
- Internal PageRank.
- Conversion value/revenue.
A boxplot effectively shows the number of “healthy” links for a particular website at various site levels. The blue box represents the interquartile range (i.e., the 25th and 75th quantiles), where most of the inbound internal links reside (67% to be exact).
Consider the bell curve. Instead of looking at the mountain from the side, you will see it like a bird flying overhead.
For example, this chart shows that for pages two levels below the home page, the blue boxes indicate that 67% of URLs have between 5 and 9 incoming internal links. You can also see that this is significantly (and perhaps unsurprisingly) much lower than pages that are one hop away from the home page.
The thick line separating the blue boxes is the median (50th quantile), which represents the middle value. Using the example above, site level 2 pages receive a median of 7 internal links, which is approximately 5,000 times less than site level 1 pages.
As a side note, you may notice that all the blue boxes don’t have a center line. The reason is that the data is skewed (i.e., not normally distributed like a bell curve).
Is this good? Is this bad? Should SEO professionals be concerned?
A data scientist without SEO knowledge may decide that it is better to calculate the distribution of internal links to pages at the site level and redress the balance.
From there, for example, if there are pages that fall below the median or 20th percentile (quantiles in data science terms) for a given site level, the data scientist can tell you that these pages need more internal links. may be concluded.
Therefore, this often means that pages that share the same number of hops from the home page (and thus the same site depth level) are of equal importance.
However, from a search value perspective, this is likely not the case, especially considering that some pages at the same level simply have higher search demand than others.
Therefore, your site architecture should prioritize pages with high search demand over pages with low search demand, regardless of their default position in the hierarchy (regardless of level).
Revised True Internal Page Rank (TIPR)
True Internal Page Rank (TIPR), popularized by Kevin Indig, takes a smarter approach by incorporating external PageRank, or PageRank derived from backlinks. In simple mathematical terms, it goes like this:
TIPR = internal page rank x page level backlink authority
Although the above is a non-scientific version of his metric, it is still a much more convenient and empirical way to model the normal values of page values within a website’s architecture. If you want the code to calculate this, see here.
Additionally, it is much more useful to apply this metric by content type than at the site level. For e-commerce clients, the distribution of TIPR by content type is shown below.
The plot for this online store case is that the median TIPR for category content or product listing pages (PLPs) is approximately 2 TIPR points.
Admittedly, TIPR is a bit abstract. How does this translate into the amount of internal links you need? At least not directly.
Despite the abstraction, this is a more effective structure for forming your site architecture.
If you want to see which categories are underperforming relative to their potential ranking positions, simply check that your PLP URLs are below the 25th quantile and look for internal links from pages with higher TIPR values. It will be.
What is the number of links and TIPR? With some modeling, that will be the answer in another post.
Introducing Revenue Internal PageRank (RIPR)
Another important question worth answering is what content deserves a higher ranking position.
Kevin also advocated a smarter approach to aligning internal link structure with conversion value. I hope many of you have already applied it to your clients. I have to agree wholeheartedly.
A simple unscientific solution is to calculate the ratio of e-commerce revenue to TIPR.
RIPR = Revenue / TIPR
The metrics above will help you see what your typical revenue per page authority is, as shown below.
As you can see, things change somewhat. Suddenly, the blog content box (i.e. distribution) no longer appears because no revenue is recorded for the blog content.
Practical application? If we use this as a model by content type, pages higher than the 75th quantile (i.e. north of the blue box) of their respective content type should have more internal links added.
why? Because although the revenue is high, the page authority is very low. This means that your RIPR is so high that you need to increase your internal links to get closer to the median.
In contrast, content that has too many low-yield but important internal links has a low RIPR and should have links removed so that high-yielding content is assigned more importance by search engines.
important point
RIPR incorporates several assumptions, including that analytical revenue tracking is properly set up so that the model forms the basis for effective internal link recommendations.
Of course, similar to TIPR, you need to model the value of internal links in terms of how much RIPR an internal link from a particular page is worth.
That’s before we even get to the place of internal link placement itself.
Other resources:
Featured Image: NicoElNino/Shutterstock