Aggregate Feeds using Linq to Xml

Submitted on Mar 21, 2009, 7:04 p.m.

I decided to create an aggregate sitemap.xml for the root of my domain. There are sitemap handlers in the blog, otherblog and photo subdirectories already – but I wanted a single sitemap.xml in the root that I could submit to Google.

I started with some pretty hacky attempts... fumbling my way around Linq to Xml. Also came unstuck with namespace normalization – since the sitemap handler in dasBlog is using the older 0.84 version namespace from sitemap.org.  So what I thought would take 20 minutes – ended up taking a bit longer, although the results were worth it since the same approach could be used for aggregating Atom or RSS feeds.

(Note – big thanks to Martin Honnen over at http://social.msdn.microsoft.com/Forums/en-US/categories/ for helping out with some of the Linq to Xml).

The ChangeNamespace helper method does not convert attributes although that could be added as well.

The safe method for loading urls along with the use of an iterator block (yield return) makes the GetElements helper method pretty efficient I think. What’s more, since this returns an IEnumerable<XElement> – additional Linq query expressions could be used here – like sorting or grouping; particularly useful if you were aggregating an Atom feed and wanted to sort by date.

static void Main(string[] args)
{
XDocument feed = MergeSiteMaps(new List<string>() { "http://www.58bits.com/blog/googleSitemap.ashx", "http://www.58bits.com/otherblog/googleSiteMap.ashx", "http://www.58bits.com/photos/sitemap.xml"});
XNamespace sm = "http://www.sitemaps.org/schemas/sitemap/0.9";
foreach (XElement location in feed.Root.Elements(sm + "url").Elements(sm + "loc"))
{
Console.WriteLine((string)location);
}
}
public static XDocument MergeSiteMaps(IEnumerable<string> urls)
{
XNamespace sm = "http://www.sitemaps.org/schemas/sitemap/0.9";
XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance";
XNamespace xsd = "http://www.w3.org/2001/XMLSchema";
string schemaLocation = "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd";
//Our container sitemap document
return new XDocument(
new XDeclaration("1.0", "utf-8", "yes"),
new XElement(sm + "urlset",
new XAttribute(XNamespace.Xmlns + "xsi", xsi),
new XAttribute(XNamespace.Xmlns + "xsd", xsd),
new XAttribute(xsi + "schemaLocation", schemaLocation),
new XElement(sm + "url",
new XElement(sm + "loc", "http://www.58bits.com/"),
new XElement(sm + "lastmod", DateTime.Now.ToString("yyyy-MM-dd")),
new XElement(sm + "changefreq", "monthly"),
new XElement(sm + "priority", "1.0")),
new XElement(sm + "url",
new XElement(sm + "loc", "http://www.58bits.com/default.aspx"),
new XElement(sm + "lastmod", DateTime.Now.ToString("yyyy-MM-dd")),
new XElement(sm + "changefreq", "monthly"),
new XElement(sm + "priority", "1.0")),
GetElements(sm, urls, "url"))
);
}
private static IEnumerable<XElement> GetElements(XNamespace ns, IEnumerable<string> urls, string elementLocalName)
{
XElement source;
foreach (string url in urls)
{
try
{
source = XElement.Load(url);
}
catch (Exception ex)
{
//TODO: Log the Url that failed
string message = ex.Message;
continue;
}
XNamespace defaultNamespace = source.GetDefaultNamespace();
bool differentNamespace = (ns != defaultNamespace);
foreach (XElement element in source.Elements(defaultNamespace + elementLocalName))
{
if (differentNamespace)
ChangeNamespace(ns, element);
yield return element;
}
}
}
private static void ChangeNamespace(XNamespace ns, XElement entry)
{
foreach (XElement e in entry.DescendantsAndSelf())
{
if (e.Name.Namespace != XNamespace.None)
{
e.Name = ns.GetName(e.Name.LocalName);
}
}
}