Using, Sharing, and Securing Rich Data on the Internet using Online Identity
It makes me a happy boy to see dialogue occurring on the best way to share and syndicate rich data publicly on the Internet. I truly believe that when this bridge it crossed it will enable the next wave of Internet technology evolution/revolution, and I'm glad people are thinking in this direction so this happens sooner rather than later. I also think Live Clipboard will be a nice catalyst for the whole idea because it empowers microformats in such a dramatic way. All of this technology is still in its infancy, of course, but these are the types of conversations that need to happen between early adopters, developers, and entrepreneurs before it can go mainstream.
One thing that seems to keep coming up, and understandably so, is the idea of securing syndicated data. For example, if I wish to publish certain parts of my contact information such as my email address, but keep other parts private and secure, such as my mobile number, I can't very well publish a vcard out to the Internet. Even hiding certain chunks of it with stylesheets won't hide the content from aggregators, search engines, and people who know how to "View Source". It's simply not an effective security mechanism.
Related to this is the sticky question of whether the data should be embedded directly in the content (or page) itself, or if the content should simply contain a pointer to the data (in the form of a URI). The first approach is demonstrated in my little Microcontent Viewer example from a few weeks back, the second approach is demonstrated by i-Tags.
The concept of embedding data in content and how to secure it is a tough one, and I struggled with it for a long time. I wondered if the URI-only approach was correct so that it could actually be an application at the other end of the URI which would be able to ask the reader who he was and provide the appropriate subset of data. However, that raises all kinds of problems with user authentication and firewalls. For example, if I publish a blog post that contains a vcard which points to a URI inside the firewall, that vcard becomes useless to anyone on the outside, so what's the point.
Personally, I finally decided a few months back that a hybrid approach was needed: I would embed only the public data that anyone should be able to see into the content itself, but also provide a URI that can be used to retrieve the full set of data (or the subset that the reader has been allowed to see). It could also be used by the reader to refresh the embedded data when the URI endpoint is available and online. This is exactly the approach taken by my Microcontent Viewer example, although the refresh piece isn't hooked up.
I'm still pretty convinced that embedding public data in content is a good way to go. After I published my test post Technorati picked up the embedded microcontent and I was able to find it using their microformat search, and get pictures and all. See the results here. It actually added value to the content, which was very cool. Progress!
The URI endpoint part of this is much trickier, but is also much more interesting. It's the secret sauce that's going to really kickstart some wild revolutions in online technology. I believe that an open source application is needed to share, provision, and publish content at URI endpoint's, and that application is currently and secretly in the works but won't be released until it's ready. For securing data, it uses an ingenious solution thought up by M. David Peterson to securely publish URI endpoints, see his post here for the technical details.
The beauty of this hybrid solution is that you can have your cake and eat it too. Users who you don't even know exist can use the subset of your data that you make public, and public applications have data to work with but only that which you want the entire world to be able to see. The embedded microcontent portion enables applications like the Technorati microformat seach to pick up and use it. However, if somebody wants to get to know you better, they can send you a request in the form of an LLUP message and you're then able to personally decide if you want to allow more information out to that individual or not, and at what level.
There are so many more goodies that this will enable in addition to fresh data. You get to maintain an actual list of "trusted subscribers", and actually TELL them when you update or create content instead of waiting for them or their feed readers to find it. You'll be able to tell EXACTLY what people are doing with your data. You'll even have the ability to let THEM edit it, and tell you about it, at which point you'll be able to determine if you want to accept their changes or not. (I can't wait for the day when somebody I trust can update their contact info from one of my blog posts and it will update all of my other blog posts as well as Outlook, Gmail, and my Blackberry.) All the promises of push will be fulfilled when this domino falls.
Dion Hinchcliffe, a great writer and one of the few bloggers I make it a point to read on a regular basis, has done a wonderful job of outlining one of the primary barriers in front of this technology, which is the lack of a decentralized identity system. A decentralized identity system is vital before we can even think about securely sharing data. And while I certainly respect the efforts that are out there such as OpenID as SXIP, I think that the misguided efforts by companies like Microsoft and Google are going to derail any attempts to unify identity in the short term. I think the catalyst for change is going to be the point in time when people can see actual tangible benefits from an decen tralized identity system, in the form of new capabilities that such a system baked into core software can bring.
I've got some great big surprises in store in this area in the very near future, but I won't release anything until it's polished and ready. First impressions are pretty important, ya know ;)