Fabrice Canel from Microsoft mentioned that every day Bing discovers 12s of billions of normalized URLs by no means seen earlier than. That’s a variety of new URLs for BingBot to search out in a single day, do not you assume?
However the internet is massive and content material is consistently being produced, not simply high quality content material however a variety of junk, gibberish, machine-generated content material, and so forth.
Fabrice defined on Twitter that a lot of the content material is “largely ineffective content material,” he listed examples equivalent to duplicate content material, scraped content material, mechanically generated content material, spam content material, junk content material, and extra.
So whereas Bing could uncover billions and billions of latest URLs per day, I doubt it indexes a lot of it.
Listed below are these tweets:
Website of the web = ♾. We uncover at #bing every day 12s of billions of normalized URLs by no means seen earlier than. Largely ineffective content material (duplicate/scraped/mechanically generated content material, spam, junk, and many others.). See our tips https://t.co/IKdDkLNs6W together with the “Issues to keep away from”
— Fabrice Canel (@facan) August 17, 2022
Discussion board dialogue at Twitter.