Replying to Avatar Geoffrey Adams

nostr:npub14d70xk632yuqshz7hdrnnj79j3yufrphy4u7ryekmpr7vztwvf5q8zdm4s I'm no expert really, but during my PhD after the 10th time rewriting my Matlab analysis code to rearrange my data in a slightly different way for a new analysis question, I made the slightly peculiar choice to spin up a MySQL database (might have been MariaDB actually, this was not too long after it forked off of MySQL) to solve my data management problem once and for all. It was using a sledgehammer to drive a nail, but by golly that nail got driven. In retrospect I could and should have solved my problem much more simply using SQLite, which doesn't require a centralized database server and lets you easily pass around your dataset in one (potentially massive) file, which is better for scientific data sharing. So I don't know about the differences between major RDBMS software, but my experience taught me that relational databases (whether implemented with RDBMS or more ad hoc methods) are amazing and seriously underused in science, and SQLite is a great tool in particular.

nostr:npub1fd0ec26hnd5dxx8ktt5x0fp24hpqxxkgxwrxxyk75zeqtzlzd7fqsmxxus

i am super glad you found something that worked for u :). I think the thing that is mostly missing is data modeling, RDBs being one way of doing that, but they have certain cognitive and practical limitations that make me think there might be much lower ceilings to them than we might want for generalized data infrastructure. extremely good for specific problems, not so good for interoperability

Reply to this note

Please Login to reply.

Discussion

nostr:npub1fd0ec26hnd5dxx8ktt5x0fp24hpqxxkgxwrxxyk75zeqtzlzd7fqsmxxus

but you're totally right, scientists waste an enormous amount of time and labor because of the strong disincentives towards the diffusion of infrastructure in labs. everyone reinvents everything from scratch, and so you might not even be exposed to the idea of modeling your data until it's way too late. the bigger labs can limp along like this, compensating for shitty infra by just throwing more labor at the problem, but it can swamp and shutter smaller labs. in both cases it can have long ranging and extremely massive impacts on the work culture that can't be appreciated as infrastructural problems by people within the lab because there is no alternative in sight.