Microsoft's documentation is very wrong about CSV.

I've fired up Visual Studio again, to work on a pet project of mine, and I searched the latest and greatest MSDN Library for "CSV" and found two articles explaining how to use it with LINQ. Ah, thought I, this will no doubt demonstrate how to use .NET's CSV parsing library with LINQ...

The article on Reordering the fields of a delimited file helpfully explains:

By using the Split method to separate the fields, it is very easy to query and manipulate CSV files by using LINQ.

It even has a code sample:

// Create the query. Put field 2 first, then
// reverse and combine fields 0 and 1 from the old field
IEnumerable<string> query =
    from line in lines
    let x = line.Split(',')
    orderby x[2]
    select x[2] + ", " + (x[1] + " " + x[0]);

The article on how to compute column values in a csv text file also uses a string split on comma to separate the fields, although it also gives the helpful to split on tab if it's tab delimited.

Now, I'm aware that these are just examples of how to use LINQ more than examples of how to parse CSV, but they are the only remotely relevant results I can find for "is there a standard library in .NET that handles CSV" (No, it would seem), and as such people might take their advice ad verbatim - at the very least add a disclaimer saying "This will do hilarious things in production."

If you want an illustration of why splitting on comma is incredibly useless and downright wrong for anything beyond simple problems with data 100% guaranteed to arrive in the precise format you expect, use LINQ and string split to work with this valid CSV data:

Name,Age,Favourite Ungulate
"James, Earl of Stratford, Defender of the Heath, Occasional Plumber",24,Oryx

There's a reason that there are CSV parsers.

Although admittedly I haven't yet figured out which one on the .NET platform is best, although this one looks promising. You can also use OleDbConnection to handle CSV; though like a lot of things Microsoftish it seems to get needlessly complex fast, as it can require modifying registry keys to handle mixed data types. For someone who learnt to code in Python, I find this lack of batteries in .NET disturbing. CSV is a relatively common file format.

TL;DR - Microsoft documentation demonstrate the absolute worst way to handle CSV - and I have a hunch that a lot of MSDN code samples end up copied and pasted into production codebases.

Published: 23rd January, 2009