How do we fix Google? Correcting a private information source

July 8, 2011 at 2:32 pm 21 comments

Google my name, “Mark Guzdial.”  You’ll get back a page that looks something like this.

I’m not an Associate Professor.  I was promoted to “full” Professor six years ago.  Check out the page that Google references — it says I’m a professor.  Check the cached version at Google — it says I’m a professor.  Why does Google tell people I’m an Associate Professor?

It’s slightly more than a minor annoyance.  I have given several talks in the last six years where I have been introduced as “Associate Professor.”  I have a collection of posters advertising my talks on which I’m listed as an “Associate Professor.”  Most people don’t go further than the Google search page when trying to figure out my title.

It’s actually an interesting example of a larger problem.  We have all become reliant on an information source, Google.com, which is out of our control and is used pervasively.  If Google says something is true, many people will simply believe it — and there’s virtually no way to call it into question, as I’m trying to do here.  Google is a private company, a private information source, but not a news medium that one can cite and critique.  How do we “fix” Google’s errors?  What about “Don’t be evil“?  Seems fairly “evil” to me to tell lies, spread them widely, and don’t allow for correction.

By the way, Bing gets it right.

 

Entry filed under: Uncategorized.

NCWIT database of CS jobs vs graduates per Congressional district It will never work in theory: New blog on empirical software engineering

21 Comments Add your own

  • 1. BKM  |  July 8, 2011 at 2:42 pm

    I’ve submitted reports of incorrect addresses to Google Maps, and they have fixed the errors and notified me when fixed. That is the only time I have corrected them.

    Reply
  • 2. Nick  |  July 8, 2011 at 2:54 pm

    Finally someone says it! No one company can be 100% correct all of the time. We trust google far more than we should.

    Reply
  • 3. Max Hailperin  |  July 8, 2011 at 3:31 pm

    Already this blog post is on its way to being a “heisenbug.” Although the search result still looks “something like” shown, the second-ranked result is now this blog entry itself.

    Reply
  • 4. Manas Tungare  |  July 8, 2011 at 3:51 pm

    I think it’s getting it from http://www.google.com/Top/Computers/Programming/Languages/Smalltalk/Personal_Pages/ which clearly hasn’t been updated in 6 years. But yes, Google should be using the title from your site, not the directory. Looks like a bug.

    Reply
  • 5. Kurt L.  |  July 8, 2011 at 3:57 pm

    The solution is for people using Google to be more careful. If I wanted to figure out your title I’d be sure to get that info from your own website, not some out-of-date search result. Sure it’s a technical flaw on Google’s end as well, but in my opinion, most of the fault lies with the searcher who didn’t bother taking the extra two seconds to verify. There is a analogy to using Wikipedia for research. Wikipedia, like Google, is a great starting point, but you always need to check the sources.

    Reply
    • 6. Mark Guzdial  |  July 9, 2011 at 8:53 am

      The question for me, Kurt, is “How would people know?” I would’ve guessed that Google’s snippets came from the web pages, so it’s merely a mirror of the content. But clearly it’s not. I think the message is getting out (especially to school children) that Wikipedia is crowdsourced and should be verified. But Google? How often (other than this blog post 🙂 have you seen Google critiqued as a flawed information source? How would we know that Google is inserting information, besides what it gets from the web pages themselves?

      Reply
      • 7. Jeff Rick  |  July 9, 2011 at 4:09 pm

        It is completely unrelated to your original post, but I’m increasingly convinced that the “Wikipedia content does not count as a real source” argument needs to be revisited. The truth is that there is no absolute source. Peer-reviewed scientific publications (i.e., the gold standard for knowledge) are too frequently out-of-date or inaccurate (it is quite easy to cite an article that has been shown to be flawed without being aware of the follow-up article). I’m convinced many flaws never even get caught. Furthermore, newspaper articles summarizing scientific articles, in my experience, often get it wrong. There is no black / white demarcation of acceptable source / unacceptable source; it is all shades of gray. I’d consider a Wikipedia cite as more trustworthy than most newspaper cites; there’s a stronger chance that the Wikipedia article is reviewed by an expert and it is more likely to be up-to-date.

        Reply
  • 8. Sarita Yardi  |  July 8, 2011 at 4:10 pm

    I posted this on G+/Twitter and Manas said he “escalated appropriately.” A few other folks mentioned that putting a tag in your tag should solve the problem. Technical part aside, the part about people skimming taglines–zero click approach–is super interesting!

    Reply
    • 9. Sarita Yardi  |  July 11, 2011 at 8:25 am

      Whoops – I wrote that putting a [open tag]meta[close tag] tag in your [open tag]head[close tag] tag should solve the problem but the html messed up wordpress. 🙂

      Reply
  • 10. gasstationwithoutpumps  |  July 8, 2011 at 4:18 pm

    I have sent many corrections of Google bike routing information and most of my corrections have been accepted within a few weeks.

    I’ve no idea how they managed to lock in the wrong title for you, but a correction should be easy to get, if you inform Google of the problem.

    Reply
    • 11. Mark Guzdial  |  July 8, 2011 at 11:39 pm

      Good to know! How did you alert Google to the error? Were did you find the email address?

      Reply
      • 12. gasstationwithoutpumps  |  July 9, 2011 at 1:11 am

        The bike directions are still in beta release, and the Google maps interface provides an explicit link for corrections.

        I looked around for how to correct Google snippets (it took me a couple of searches to figure out that is what they called their short summaries). According to the Google help pages, they have no mechanism for making individual corrections to the snippets—the technique they recommend it to use a “Description” meta tag for your home page.

        Reply
  • 13. Max Hailperin  |  July 8, 2011 at 4:58 pm

    It looks like the underlying problem is on the Open Directory Project (see dmoz.org). You could try getting that corrected, or you could use a META tag to tell Google not to use ODP. (You could also use a META tag to provide the description of your choice.)

    Reply
  • 14. fredm  |  July 9, 2011 at 3:30 pm

    That’s pretty weird — is Google’s summarization engine broken?

    How could the page pointer and cache be correct, and the summary be wrong?

    Mark, any chance there’s info at http://www.cc.gatech.edu/fac/mark.guzdial/ before the redirect happens?

    BTW I agree that Google has the responsibility to be correct, rather than asking us to assume it’s unreliable. Kind of like when the compiler fails to compile code properly — you can’t go around all the time thinking the compiler’s broken as your first line of defense.

    Reply
  • 15. kebernet  |  July 10, 2011 at 7:54 pm

    I don’t think it is google’s fault. I think it is because the CoC put a sitemap on the server that doesn’t include your page so google considers it “unchanging.”

    http://www.cc.gatech.edu/sitemap.xml

    Reply
  • 16. Paul Haahr  |  July 11, 2011 at 10:50 am

    The reason for that out of date snippet is that we’re getting the snippet from the Open Directory (ODP, dmoz.org), as Max Hailperin pointed out above. You can see it on this page:

    http://www.dmoz.org/Computers/Programming/Languages/Smalltalk/Personal_Pages/

    Two easy fixes. First is to correct the open directory entry, but that depends on an editor still being there to review and approve the change. Second, entirely within your control, but silly for you to have to do, is to put

    in the source of the page and we’ll then ignore the ODP entry.

    Sorry for the trouble. ODP is a great resource but it needs to be maintained for it to be useful, otherwise the descriptions get stale.

    (I’m an engineer who works on search at Google and this was pointed out by a common friend.)

    Reply
  • 17. Paul Haahr  |  July 11, 2011 at 10:52 am

    The reason for that out of date snippet is that we’re getting the snippet from the Open Directory (ODP, dmoz.org), as Max Hailperin pointed out above. You can see it on this page:

    http://www.dmoz.org/Computers/Programming/Languages/Smalltalk/Personal_Pages/

    Two easy fixes. First is to correct the open directory entry, but that depends on an editor still being there to review and approve the change. Second, entirely within your control, but silly for you to have to do, is to put

    <meta name=”robots” content=”noodp”&rt;

    in the source of the page and we’ll then ignore the ODP entry.

    Sorry for the trouble. ODP is a great resource but it needs to be maintained for it to be useful, otherwise the descriptions get stale.

    (I’m an engineer who works on search at Google and this was pointed out by a common friend.)

    Reply
  • 18. Alfred Thompson  |  July 14, 2011 at 11:52 am

    I run into this sort of problem all the time. It’s going to happen with purely automated systems. In any case I have learned to go to primary sources. For example when looking up a professor’s title or contact information I will not rely on the search engine page but use the search engine to take me to the right page on the university web site. I’ve been burned too often.

    Reply
  • 19. Eric Baumgartner  |  July 15, 2011 at 1:03 pm

    Hi Mark,

    Have you looked at schema.org? It defines a number of HTML microformats that google, among others, understand. So you can be specific about telling search engines what to use for your job title.

    http://googleblog.blogspot.com/2011/06/introducing-schemaorg-search-engines.html

    http://www.schema.org/Person

    Perhaps along with the fix for DMOZ, that will help push the correct information up google’s pages.

    Reply
  • 20. Mark Guzdial  |  July 28, 2011 at 2:39 pm

    I summarized this story in a Blog@CACM post.

    Reply
  • 21. Hattie Manning  |  March 22, 2012 at 1:12 pm

    How DO you get Google to correct its errors? I googled myself and to my great surprise, I am a married African-American. This is totally untrue! Most of the other info is correct, but how they did this is a mystery to me.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 9,038 other followers

Feeds

Recent Posts

Blog Stats

  • 2,014,365 hits
July 2011
M T W T F S S
 123
45678910
11121314151617
18192021222324
25262728293031

CS Teaching Tips


%d bloggers like this: