Automattic, the parent company of Tumblr and WordPress, is reportedly in negotiation with AI firms Midjourney and OpenAI for providing training data sourced from user posts. This information comes from internal documents that 404 Media has gained access to.

According to 404 Media, Automattic has already assembled an “initial data dump” comprising all public Tumblr posts from the period 2014 to 2023, including content not visible on public blogs.

In response to the ensuing concerns, Automattic published a post titled “Protecting User Choice”, in which it clarifies its stance on AI platform crawlers. The company says it has decided to block these crawlers by default, including those operated by major tech corporations.

Automattic further clarified: “We are also collaborating directly with select AI firms only if their plans align with our community's values: attribution, opt-outs, and control. All opt-out settings will be respected in our partnerships. We also intend to regularly update our partners about users who newly opt out and request that their content be removed from past sources and future training.”

This move by Automattic mirrors a trend observed in several companies, including Reddit, which have struck deals with AI tool developers to provide training data, often sourced from publicly available online information.