big parser doesn't want you to know that building parsers is actually incredibly easy. for some reason parsers are scary to programmers.

here's a parser I just built that takes a list of text tokens out of visa pdf statements and converts them to ledger-cli (plain text accounting) format... because apparently csv exports from my bank is too much to ask for.

the basic idea is that you have one large array of strings extracted from the pdf. this is an unreadable stream of unstructured text chunks called tokens.

for each token you see if you can parse the next N tokens into a transaction line item. if it fails you just advance the parser by one and try again. if it succeeds then the parser would have eaten all of the tokens it needed so it can continue from there.

you can build this kind of parser in *any* language, it would look identical. it's just a few simple functions and data.

after doing these kinds of things at a record label for 6 years I'm convinced that accounting is more of a software engineering job, can't imagine going through pdfs manually each month.

I'm not sure what the point of this post was other than to say I like parsing things and that banks suck.

carry on.

Reply to this note

Please Login to reply.

Discussion

The point is a glimpse into your mind when nostr:npub1h50pnxqw9jg7dhr906fvy4mze2yzawf895jhnc3p7qmljdugm6gsrurqev asks you to pass the sauce…

ledger stuff transmitted via relays

lmao

Just wait until he finds out about Yacc.

Now we have LLMs that can extract any information directly through their model, or even write the parser for you.