YouTubeUploader, Google.GData.Client.dll, and the UTF-8 Byte Order Mark

EDIT: as it turns out, this was just caused by a bug that snuck into the server side – the normal GData.Client.dll that sends the BOM works fine now, as does the modified one that doesn’t.

Background: As per this post, I’m making a simple upload-my-videos-to-YouTube app.  Step 1 was figuring out how we were going to get the info out of FlipShare, which we did in this post.  Step 2 is figuring out how we’re going to actually upload to YouTube, which is this post 🙂

Investigating available options

Just a couple of days before it was announced on their blog, I had started searching for available API’s (ideally already in .NET) and ran across their YouTube SDK and the larger GData SDK on their project download page.  Awesome!  Their wiki also described a sample app that would be perfect for me to learn from – YouTubeUploader!

One thing I noticed, though, was looking around the filesystem I didn’t see the source for YouTubeUploader.  I could certainly run reflector on it (and did), and that’s certainly a lot better than nothing, but it was odd that all the other samples had source but not this one (the later announcement would explicitly say you had to get the source via subversion).

So, I go to the Source tab to try and figure out where it might be in the tree, but at the time there was only 1 hit in an unrelated unit test. I noticed the UI had a type in it (uplads instead of uploads) so I searched for that string, and no hits. I filed a bug that I couldn’t find the source which Frank from the GData team closed after noting the location in the source tree.  Checking again as I write this, whatever process indexes the source tree has now picked up the source and both searches return parts of the project 🙂

Trying to use YouTubeUploader

At that point in time, I actually had 5 videos left that hadn’t successfully uploaded with FlipShare, so I tried to use the YouTubeUploader app to get them uploaded.  Its interface to the user is a CSV file you need to create to define title/tags/category/path/etc.  No biggie – I open Excel and make it, then save as csv (I’m too lazy to make the CSV by hand, worrying about quoting strings with commas and the like).

However, when I actually run it, the uploads all fail (doh!) with “400 Bad Request” (that’s what shows up in the UI).  I figured I’ve just got something misconfigured, so I try a bunch of different things in the UI, in a YouTubeUploader.exe.config, etc., but no such luck.

At this point I think about just ditching it here rather than go down the investigatory rabbit hole, but since it’s from the GData team and it’s the sample I’d really like to start with, I press on.

Investigation

So, time to dig in to figure out what’s going on.  I run my current go-to tool for HTTP debugging, Fiddler, and then restart the app and have it try again.  As I should have expected, it does the calls over https and Fiddler in the middle is breaking all the calls since it’s not a trusted CA.  I hadn’t actually added Fiddler as a trusted root CA before, but there’s nice simple instructions to do so.

With that working, I look at the request and response of a failed call.  You can see them in the bug I filed, but the problem didn’t jump out at me at first.  I googled the error message (“Content is not allowed in prolog”) and much like you’d expect, it comes from having stuff show up before the xml prolog (“<?xml … ?>”), breaking the xml parsing (from the hits, it appears to be in Java XML parsing, not that it matters).  So I look back at the request again and I didn’t see anything before the prolog.

However, Fiddler has a ton of different ways of showing the request and response, including a handy-dandy hex view.  That show that there are indeed 3 bytes between our HTTP-spec 2 newlines (separating headers from body) and the xml prolog itself.  The bytes are 0xEF, 0xBB, and 0xBF.  Those bytes seem oddly familiar, and Google reminds me why – it’s the UTF-8 Byte Order Mark.  Ah, yes, it all starts to make sense.

Who’s to blame?

So, more googling and I run across the post that confirms what’s going on with XmlTextWriter along with a fix (Thanks Rick!).

Looking at YouTubeUploader’s source, it’s just using the ResumableUploader class in Google.GData.Client.dll, so it seems clear the bug isn’t YouTubeUploader’s fault.  Since it presumably was working for others, I was wondering if maybe it was something in the BCL (having 4.0 RC on the same box, although it should side-by-side fine and I had checked that YouTubeUploader was still running under 2.0/3.5.

Stepping through in the debugger, I see the offender – AtomBase.SaveToXml(Stream) does exactly what Rick’s post said – creating the XmlTextWriter with Encoding.UTF8, which writes the BOM.

Confirming the problem

At the time I was having problems rebuilding things (long story with no value – PEBKAC 🙂 so I wanted to verify that’s the problem by modifying the requests.  I knew Fiddler had this capability but I had never done it before.  Looking through the cookbook samples, though, it seemed pretty straightforward.

The biggest hurdle ended up being how to get those 3 bytes into a string, although for no good reason.  I knew about GetString off of Encoding but for some bizarre reason I thought that given the UTF8 BOM, it would just discard that (since I thought of it as ‘reading’ the UTF8 bytes) and leave me with an empty string.  Of course, once I finally tried it, it worked just fine and worked fine.  Figures. 🙂

The BOM isn’t going to change, but rather than put the bytes in directly to the rule, I referenced GetPreamble, which resulted in this simple addition to the OnBeforeRequest handler:

	var utf8: System.Text.Encoding = System.Text.Encoding.UTF8;
	var bom: String = utf8.GetString(utf8.GetPreamble());
	oSession.utilReplaceInRequest(bom + "<?xml", "<?xml");

I ran YouTubeUploader again with that rule in place, and sure enough, the uploads start working fine!  Yay!

Trying the fix

Once I wake up and finally realize that I don’t need to rebuild the full YouTubeUploader app, but instead just the Google.GData.Client.dll (hey, it was late :).  I make the one-line change (svn patch attached to bug), rebuild the dll, then drop it in to Google YouTube SDK for .NET\Samples and try YouTubeUploader again.  Sure enough, it worked fine (and faster, since it didn’t have to go through Fiddler’ and its request rewriting 🙂

Where are we?

So, while we (well, I) have spent an unfortunate amount of time yak shaving, we have a working sample for how to upload to YouTube.  The sample is pretty incestuous between logic and UI (which is fine, it’s a sample :), but it’s working code, uses the resumable uploader API, already supports multiple simultaneous uploads, and already supports configurable numbers of retries.  It’s already doing the hard parts, so it makes my life much easier writing my little uploader object model and apps. 🙂

Advertisements

2 thoughts on “YouTubeUploader, Google.GData.Client.dll, and the UTF-8 Byte Order Mark

  1. hello,
    after a research on the web to resolve an error 400 or 403 with YoutubeUploader, we read that you have find a solution to solve this bug.
    I try to load the good google.dataclient.dll without succes.
    Can you send me the link where i can load the good dll.
    Thank you very much.
    Have a nice day.
    Comment by Gilles — May 18, 2010 @ 7:41 pm

    • I edited the post – the bug I was hitting was a short-lived server side bug – the ‘normal’ gdata client dll is working fine for me now.

      If you’re still having problems with the stock dll, you might want to check with an http sniffer (fiddler or wireshark or whatever) to get more details on the particular error.

Comments are closed.