Protect the Open Web: Use Semantic HTML

As I begin to work on using Koype as my sole "gateway" to the social Web, I'm seeing how much of the corporate Web is blocked off from community-grown standards. I wrote a bit about my difficulty to just credit content correctly in a open fashion and now I'm really going to just go on a limb and ask people to do something a few things here.

  • Optimize for people over machines.
  • Keep things reachable via the Web.
  • Use fallback solutions more often than never.

What is "Web Friendly"?

Before I continue, I'll preface this by saying that I'm not some expert or author at the W3C or WHATWG. I haven't released a project that led to 100,000 stars on GitHub or BitBucket. My most notable interaction that people seem to know me for on the Web was when Google marked my friend as a gorilla. Not even for work I've done but for me calling out another company during the BET Awards show. I will say that I've been tinkering with the Web for over a decade and I've legitimately seen the Web go from something anyone could use and experience to a machine driven by fewer companies that aim to monetize a already-tricky transport layer.

I see a page being Web friendly by embracing the following things:

  • The content on the page is visible wholly in the source code of the page.
  • There's no drastic difference of the content when you fetch it using a different tool.
  • Content isn't missing or withheld solely because JavaScript can't be executed.

It's confusing to see entire sites just "white out" because JavaScript hasn't been made available. Browsers exist to view content and to execute secondarily. I don't aim to dismiss how JavaScript provided new ways to present content on the Web - it's used on this site as well! But it's done so in a progressive fashion. I'm not describing it in the fashion that you'd hear from developer advocates - I'm talking about progressively enhancing the existing content on the page.

Due to the "white out" effect of sites, you can't attempt to view them in a low-resource mode - you're forced to download megabytes of JavaScript when you visit sites like this. All to show a carousel and text. Text that can be displayed on the page without JavaScript! Granted, there's a service that helps you render these pages in advance but it also suffers from the "white out"; possibly not eating what they cook?

There's sites that can show their content to users without requiring JavaScript. Wikipedia, GitHub and Microsoft's XBox (barely) present pages with no JavaScript. But then we run into the case where content that's relevant or can just be displayed on these pages don't appear unless JavaScript is run. Said JavaScript is fetching content from a server that might be the same place where the page is rendered from. An extra trip pushed onto the client for the sake of unknown reasons. The usual justification here is for experimentation (on users), ease of software development and "separation of concerns".

If a page suffers from the things above, what does one do to remedy the situation?

Optimize For People Over Machines

I'd point my finger at big publications and their use of corporate-led outlining of Web content but they're just trying to make sure people can find them in silos. Which is important to their bottom line. I'm not asking them to stop that (not completely). I'm asking for (more) markup that aligns with tooling that's Web friendly. We have Microformats that gives you freedom over how content is designed and still making the page hyper relevant to the user viewing. This post is rendered using Microformats. You don't have to worry about SEO if that's a concern; reasonable search engines are capable of parsing Microformats. There's also HTML Microdata though I'd avoid going with a corporate-backed standard before using one pushed by the community - it's company practice to extend, embrace and extinguish. Microdata is also overly verbose in its definitions. Schema's another contender backed by large search engine companies. The site provides examples of what markup looks like for RDF-A, Microdata and JSON-LD.

I'd push for Microformats since it encourages you to think about the content you'll be presenting on the page versus dumping as much as you can with no real objective (or to conform to some arbitrary remote test). This means the designer of the markup will also consider what the needs and use-case of the visitor will be and provide alternative rendering when necessary. JSON-LD and the likes encourage big dumps of information; aiming to (under)optimize against the user and for the companies scraping their site for views.

Keep Things Reachable via the Web

Let's go back to When I do a request from the console for the page, I get the following:

  <!DOCTYPE html><html ng-app=""><head><meta charset="utf8"><title>Prerender - AngularJS SEO, ReactJS SEO, or VueJS SEO</title><meta name="description" content="Allow your AngularJS, ReactJS, or VueJS apps to be crawled perfectly by search engines. View on Github."><link rel="stylesheet" href="//"><link href="//" rel="stylesheet"><script>window.newApp = true;</script><link rel="stylesheet" href="/css/app.css"><link rel="stylesheet" href="/css/newapp.css"><link rel="stylesheet" href="/css/slider.css"><link rel="shortcut icon" href="/favicon.png" type="image/png"><link rel="shortcut icon" type="image/png" href=""><meta name="fragment" content="!"><script>window.prerenderReady = false;</script><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','//','ga');ga('create', 'UA-44599347-1', '');ga('send', 'pageview');</script><script>(function(e,b){if(!b.__SV){var a,f,i,g;window.mixpanel=b;a=e.createElement("script");a.type="text/javascript";a.async=!0;a.src=("https:"===e.location.protocol?"https:":"http:")+'//';f=e.getElementsByTagName("script")[0];f.parentNode.insertBefore(a,f);b._i=[];b.init=function(a,e,d){function f(b,h){var a=h.split(".");2==a.length&&(b=b[a[0]],h=a[1]);b[h]=function(){b.push([h].concat(,0)))}}var c=b;"undefined"!==typeof d?c=b[d]=[]:d="mixpanel";c.people=c.people||[];c.toString=function(b){var a="mixpanel";"mixpanel"!==d&&(a+="."+d);b||(a+=" (stub)");return a};c.people.toString=function(){return c.toString(1)+".people (stub)"};i="disable track track_pageview track_links track_forms register register_once alias unregister identify name_tag set_config people.set people.set_once people.increment people.append people.track_charge people.clear_charges people.delete_user".split(" ");for(g=0;g<i.length;g++)f(c,i[g]);b._i.push([a,e,d])};b.__SV=1.2}})(document,window.mixpanel||[]);mixpanel.init("b7b137f3b90aa34a57be816225579d6f");</script><body class="newapp"><div ng-controller="AppCtrl"><div ng-view></div></div><script src="//"></script><script src="//"></script><script src="//"></script><script src="//"></script><script src="//"></script><script src="/js/bootstrap-slider.js"></script><script src="//"></script><script src="/js/app.min.js"></script><script src="//"></script><script src=""></script><script>Stripe.setPublishableKey('pk_live_kvKGc987B7imUfijVH4DXBfJ');</script><link href="//,300|Playfair+Display:400italic|Montserrat" rel="stylesheet" type="text/css"></body></head></html>

All I can gather about this page is that it's for SEO (since that word's used three times). Granted, when you visit this site on your mobile device, you'd get a whole page after a brief moment of JavaScript being executed to render content to your browser. The issue with this? The browser's fully capable of rendering text - it has to as it's part of the specification on how most browsers are designed/built. But because of the choice to keep everything in JavaScript/execution land, I'd be unable to even load assets for this site outside of the icon of the site (the one thing they left in HTML).

Use fallbacks more often.

I'm not calling for every Web developer (or JavaScript developer?) to drop everything and learn about other tags outside of <div>, <span> and <body>. I'm encouraging for things to be done in HTML + CSS first (if not completely) then enhanced with JavaScript. There comes a case where JavaScript's used excessively. Form validation's a case of this. Below, I aim to sign into Trello to view the dashboard. See how far I get.

The "Log In" button is disabled by default - not even by a HTML5 form validation but by the default markup provided by the server. I had to enable JavaScript to find the button enabled even when the form was empty. When I did re-enable JavaScript on the page, I found this:

The button for logging in was not only active; it didn't seem to require any user input to indicate that. Granted, the indication of this is a design affordance to tell the user that signing in is feasible. But for those using that, this wouldn't be provided to them. It's an overuse of JavaScript to communicate something to the user.

As Web developers; we have a responsibility to users to make actions explicit, avoid dark UX and deliver content to visitors of our services. This post isn't meant to berate developers; in fact, it's a hope to spur more discussion on how we can provide value to users without selling them out to companies. We're already losing the venue points to get people into the game and now we're actively threatening the ability for the Web to be visible by people first. There's methods to counteract this work. I'm down to talk about options and routes.