TDC — Issue 2

Better LLM models, more interesting browser tools and some ideas about data markets and common digital infrastructure.

Sep 23, 2024

👋 Hello there! Welcome to the second edition of The Data Collective newsletter; a curated list of interesting news, ideas, datasets, people, and many other things around the Open Data ecosystem.

📢 News

Google released DataGemma, a model designed to help address the challenges of hallucination by grounding LLMs in the vast, real-world statistical data of Google's Data Commons.
You can now run SQL on top of any HuggingFace dataset directly from the browser with HuggingFace SQL console. Pick any dataset and click the SQL button to start querying it!
Philip Heltweg interviewed me (David Gasquez) on the topic of Retroactive Public Goods Funding and Open Data in Web3. You can catch the full interview in his blog.

📄 Articles

How Data Markets Fail. This article explores the practical challenges of data monetization and how these challenges relate to failures in information economics. The main issues discussed are the non-exclusionary nature of data and the cold-start problem for data aggregators.
Common Digital Infrastructure. A great article by Andrew Conner on the concept and importance of common digital infrastructure, akin to physical public utilities, that prioritizes broad accessibility and economic benefit over profit. It argues for the development of more open, decentralized digital systems, similar to what I mentioned last week with Barefoot Data Portals.

📚 Open Data Resources

I've been playing with Marimo recently; an open-source reactive notebook for Python — reproducible, git-friendly, executable as a script, and shareable as an app. If this sounds interesting to you, check out the small demo environment I've set up.
Data Commons: The Data Commons Project integrates data from public domain sources in a knowledge graph. We’ll talk more about it in the future, but if you’re looking for a place with lots of datasets, this one has them all homogenized and halso has great APIs!

That's all for this issue! If you have any feedback or want to share something, feel free to reply to this email or open an issue in our GitHub repository!

The Data Collective

TDC — Issue 2

Better LLM models, more interesting browser tools and some ideas about data markets and common digital infrastructure.

📢 News

📄 Articles

📚 Open Data Resources

Discussion about this post