CSharp MARC

  • Increase font size
  • Default font size
  • Decrease font size
Getting Started

New FileMARCWriter

E-mail Print PDF

The new RDA format for MARC21 uses the copyright symbol in place of the letter c in the 260 tag. This leads to an interesting problem with MARC21 files when it comes to including special characters because the actual encoding is generally MARC8 which is not supported by .Net rather than UTF-8/Unicode which .Net does support. Generally special characters aren't a huge problem, unless you are working with a lot of large collections full of foreign books, which is something I have never had to deal with before. The copyright symbol is, however, a problem that I needed to solve in order to make sure my class structures do support RDA. Since most of the people I build records for expect them to be encoded in MARC8 rather than UTF-8, I had to make sure that these files were being written and read correctly.  If you want to read more about the problem in particular, check out Mark Sullivan's blog. Mark is the developer of another set of class structures, and has taken a slightly different approach than I have. His blog post does a wonderful job of explaining the problem in detail, and talking about how to fix it. While my solution isn't as fully featured as the MARC8 support in Mark's SobekCM MARC Library, it does support writing files back out in MARC8 rather than fully converting them to UTF-8.

The latest version of CSharp_MARC, released today, has a new class: FileMARCWriter. Not only does this make writing records out to files easier, but it also has the benefit of supporting a select number of special characters. Currently the only characters supported are the visible characters in the A-C rows of the ANSEL MARC8 encoding. More will be coming soon, as it will take some time to add the full MARC8 character set.

Here's a quick example of the new FileMARCWriter's use:

List<Record> records = new List<Record>();
//Fill your list of records just like you always have

FileMARCWriter writer = new FileMARCWriter(filename);
foreach (Record marc in records)
{
          //modify each record as you normally would
          writer.Write(record);
}
writer.WriteEnd(); //Write the end of file (Hex 1A) character
writer.Dispose();
Last Updated on Wednesday, 10 April 2013 17:01
 

How to edit and save records

E-mail Print PDF

There has been some confusion about the process of editing and saving files, which hopefully this tutorial will cover. You can edit and save records with C# MARC just fine, but it's a little weird because of how the MARC format is designed. There isn't a really good way to edit records "in place" so it's not as simple as just changing the data. It is also complicated, as I designed the class structure to only decode a single MARC record when it needs to be, rather than attempting to decode an entire file all at once. This makes reading files quicker, but editing them slightly harder. If you're only using the File_MARC object, no changes are saved in the record, as each time you get an individual record it decodes it from the raw record. The main goal of the File_MARC class is to go from a raw MARC file to an actual Record object easily, but that is all it is good for.

What you need to do depends largely on how much memory you have to work with, and how big your record files are. Assuming you have a relatively small number of MARC records that you're working with, such as from a recent order of books that you need customized before putting them into your collection management system, this is what I would do:
//Open the MARC file
FileMARC marcFiles = new FileMARC();
marcFiles.ImportMARC(@"C:\some_order.mrc");
//Open a writer to save the new records.
StreamWriter writer = new StreamWriter(@"C:\edited_order.mrc", false, ASCIIEncoding.Default);
foreach (Record marc in marcFiles)
{
//Create a new field to insert
DataField newField = new DataField("852");
//Create some subfields
Subfield newSubfield = new Subfield('h', "123.45 ABC");
newField.Subfields.Add(newSubfield);
newSubfield = new Subfield('p', "123456789");
newField.Subfields.Add(newSubfield);
marc.Add(newField);
//Write the marc record out to the file
writer.Write(marc.ToRaw());
}
//Write the end of file character. This isn't necessary, but some programs will throw an error if there isn't a clear end of file marker after all the records.
writer.Write('\x1A');
//Always clean up your disposables
writer.Close();
writer.Dispose();
If you're not writing the data back out right away, and want to work with it more than just one at a time, you can use a List<Record> object, and fill it with decoded Records. This is my preferred way of working with records as the C# List object is extremely versatile. This code will allow you to loop through the records several times, manipulating them however you wish:
List<Record> records = new List<Record>();
foreach (Record marc in marcFiles)
{
records.Add(marc);
}
When you're done using your List<Record> to make whatever changes you need, you can write the files back out as easy as this:
using (StreamWriter writer = new StreamWriter(filename, false, ASCIIEncoding.Default))
{
foreach (Record record in records)
{
writer.Write(record.ToRaw());
}
//Write end of file character.
writer.Write('\x1A');
}
If you have a huge file, use the new File_MARCReader class instead, but otherwise it should work the same way. The benefit of the MARCReader class is that t doesn't just blindly read an entire file into memory before it starts decoding. This makes it much more efficient if you're working with large files, with the trade off that you are forced to go through records one at a time, and the source file remains open for as long as you're using it.
Last Updated on Thursday, 20 September 2012 07:19
 

Now supporting large files

E-mail Print PDF

Thanks to some feedback from users, I've implemented an update to CSharp MARC to support large files.  In order to support the current API and not break how the FileMARC class was intended to be used (which is loading an entire file into memory and then doing batch processing on it), I added a FileMARCReader class.

The FileMARCReader class simply takes a filename and allows it to be used as an IEnumerator.  It reads the file in small memory-efficient chunks and spits out records one at a time in order.  This will hopefully help those who need to work with large files.

Here's a quick example of its use:

string filename = "manyrecords.mrc";
FileMARCReader reader = new FileMARCReader(filename);
foreach (Record marc in reader)
{
//Do whatever you'd normally do on a Record object
}
reader.Dispose();
Last Updated on Wednesday, 13 June 2012 13:48
 

Basic Functionality

E-mail Print PDF

This can also be found in the source code download from SourceForge as "CSharp_MARC Demo"

//Read raw MARC record from a file. 
//The example .mrc files are in the project's root.
string rawMarc = File.ReadAllText("..\\..\\..\\record.mrc");
//The FileMARC class does the actual decoding of a record and splits a string of multiple records into a list object. 
//Decoding is not done until you actually access a single record from the FileMARC object.
//You can import records straight from a string in memory. The string can have one or many MARC records.
FileMARC marcRecords = new FileMARC(rawMarc);
//Or you can import it straight from a file
marcRecords.ImportMARC("..\\..\\..\\record2.mrc");
//You can get how many records were found by using the Count property
Console.WriteLine("Found " + marcRecords.Count + " records.");
//You can access each individual record in it's native MARC format using the RawSource object
Console.WriteLine("Here is the first record:");
Console.WriteLine(marcRecords.RawSource[0]);
//You can access each record manually using array notation.
Record firstRecord = marcRecords[0];
//Or you can loop through them as an Enumerable object
//Note: I recommend only retrieving each record from the FileMARC object once as each time you do it will be decoded. 
int i = 0;
foreach (Record record in marcRecords)
{
//The Warnings property contains a list of issues that the decoder found with the record. 
//The decoder attempts to return a valid MARC record to the best of it's ability.
Console.WriteLine("Book #" + ++i + " has been decoded with " + record.Warnings.Count +" errors!");
//Once decoded you can easily access specific data within the record, as well as make changes.
//Array notation we will get the first requested tag in the record, or null if one does not exist.
//First we'll get the Author.  Since there should only be one author tag array notation is the easiest to use.
Field authorField = record["100"];
//Each tag in the record is a field object. To get the data we have to know if it is a DataField or a ControlField and act accordingly.
if (authorField.IsDataField())
{
DataField authorDataField = (DataField)authorField;
//The author's name is in subfield a.  Once again since there should only be one we can use array notation.
Subfield authorName = authorDataField['a'];
Console.WriteLine("The author of this book is " + authorName.Data);
}
else if (authorField.IsControlField())
{
//Unreachable code!
Console.WriteLine("Something went horribly wrong. The author field should never be a Control Field.");
}
//Now we will get the subjects for this record. Since a book can have multiple subjects we will use GetFields() which returns a List<Field> object.
//Note: Not passing in a tag number to GetFields will return all the tags in the record.
List<Field> subjects = record.GetFields("650");
Console.WriteLine("Here are the subjects for Book #" + i);
//Here we will assume each Field is actually a DataField since ISBNs should always be a DataField.
foreach (DataField subject in subjects)
{
string subjectText = string.Empty;
//We also want to loop through each subfield.
//Just like with GetFields() you can either pass in a subfield value, or nothing to get all the subfields
foreach (Subfield subfield in subject.GetSubfields())
subjectText += subfield.Data + " ";
Console.WriteLine(subjectText);
} }
Last Updated on Wednesday, 13 June 2012 13:49