How does one parse source code?

Source code is just a text. Transforming that text into a structured format needed for software to understand what is written is called parsing.

Damir Bulic

Aug 28, 2021 • 1 min read

Parsing is a process of transforming text into in-memory structure

It requires a grammar (rules of the language text should be written in). With grammar and our parser (code that transforms the text) we can check that text actually conforms to the rules of the grammar. Once we have our handy structured representation of the source text, all kinds of good things are possible.

My initial contact with parsing

At university, one of our lab exercises was to build a programming language of our own. Not interested in half measures, my group build a language called Cascal (obviously, its syntax was a mix between C and Pascal) with a full-blown IDE able to compile and execute programs written in Cascal. It worked beautifully.

We didn't get top marks for the project even though our project was the only one that actually worked. All of the work was done by two guys. Zeljko (@zsvedic) wrote the parser. I wrote the IDE, built the assembler commands for the high-level statements and expressions, and called the assembler and linker. Everyone else coasted. When one member of the team couldn't say a few coherent sentences about the projects, professor lowered all our grades.

I was pissed.

However, I shouldn't have been upset, because parsing is hard. We could have worked with other students to involve them more, but, had we done that, our project would never be finished. Not everyone can write parsers.

Parsing is hard

Many years later (but also over 10 years ago), I set up to parse SQL and translate between database dialects. It turned out to be an extremely hard problem!