Mergeflow data sets

Currently, Mergeflow collects ca. 15,000 new documents every day. This is done fully automatically, 24/7, and includes news, patents, science publications, blog posts, press releases, clinical trials, and other data sets and sources (see below for details).

Mergeflow collects data sets from across various types of sources, most of them via the web. While some of these sources provide data collection interfaces (APIs) for their contents, most sources do not, and require us to build customized crawlers.

All the data collected by Mergeflow are raw, unstructured text data. We do not rely on structured third-party databases, e.g. for market or investment data. Instead, all these data are extracted from text, via natural language processing, semantic modeling, machine learning, and other methods.

You can use Mergeflow's data sets for better search strategies. We describe how you can do this in our blog article, 'How to get better search results for tech discovery'.

Venture Capital News

Updates on venture capital funding events from around the world. This includes dedicated information portals but also blogs etc. run by investors such as venture capitalists.

From these contents, and for each venture funding event, Mergeflow extracts the names of the company that received venture funding, the amount they received, and the investors. If the same venture funding event is reported multiple times across Mergeflow's data sources, this is deduplicated into a single event.

Non-USD funding amounts are converted to USD at the daily exchange rate of the funding date; the original non-USD funding amount is available as well.

  • Ca. 1,500 new documents / week.
  • Updated every 2 hours.

Market News

Mergeflow collects market news from dedicated market and finance news portals, as well as finance sections of many worldwide daily news outlets and press release platforms.

From these contents, Mergeflow extracts the name of the market segment, as well as the market size and growth (CAGR) estimate. If there are multiple estimates for the same market segment that all estimate the same market size and CAGR, this is deduplicated into a single estimate. But if multiple estimates for the same market segment differ either in market size or CAGR estimate, Mergeflow reports these estimates as different.

  • Ca. 12,000 new documents / week.
  • Updated every 2 hours.

Patents

Patent publications from all worldwide patent offices, collected and provided by the European Patent Office. This database is the biggest patent database in the world. In Mergeflow, patents are bundled by families. This means that Mergeflow counts patent families rather than individual patent publications (the number of individual patent publications is a lot higher since patent families bundle individual patent publications). Mergeflow first collects each patent in its original form and language, and then replaces non-English texts by the English version as soon as it becomes available via the European Patent Office.

  • Ca. 12,500 new documents / week.
  • Updated weekly.

Scientific Publications

Research papers from across different disciplines. This includes peer-reviewed journals and conference proceedings, open access journals, preprint archives such as arxiv and bioRxiv, as well as databases such as PubMed.

  • Ca. 30,000 new documents / week.
  • Updated every 2 hours.

Technology Transfer and Licensing

Technologies available for licensing from universities and R&D organizations worldwide, via their technology transfer offices. This includes, for example, Columbia University, Cornell University, Emory, Hebrew University, MIT, Purdue, Rice, Stanford, University of California, US National Laboratories; Universities of Arizona, Cambridge, Chicago, Delaware, Michigan, Pennsylvania, Tel Aviv, Texas; ESA, NASA, Weizmann Institute, and others.

  • Ca. 250 new documents / week.
  • Updated every 2 hours.

Technology Blogs

Thoughts, ideas, and forecasts from the most respected tech journalists around the world. This includes individual blogs but also selected technology news portals.

  • Ca. 10,000 new documents / week.
  • Updated every 2 hours.

Industry News

Worldwide business news from across various industries. This includes dedicated industry news outlets and PR newswires but also economics and business sections of many worldwide mainstream media outlets.
  • Ca. 15,000 new documents / week.
  • Updated every 2 hours.

Funded Research Projects

Descriptions of US, UK, and EU publicly funded research projects. The funding agencies and programs we monitor include SBIR, NIH, NSF, Innovate UK, and EU CORDIS.

From these contents, Mergeflow extracts the name of the funded companies and research organizations, project names where applicable, and the funding amount. Non-USD funding amounts are converted to USD at the daily exchange rate of the funding date; the original non-USD funding amount is available as well.

  • Ca. 400 new documents / week.
  • Updated every 2 hours.

Clinical Trials

Updates on clinical trials from around the world, e.g. NIH (clinicaltrials.gov) and the EU Clinical Trials Register (clinicaltrialsregister.eu), and from across all trial phases.

Mergeflow tags disease names using the ICD-10 nomenclature, which is a medical classification list by the World Health Organization (WHO).

In Mergeflow you can search clinical trials by phase.

  • Ca. 600 new documents / week.
  • Updated every 2 hours.

Coverage

All data sets have worldwide coverage, except Funded Research Projects (US, UK, EU).

Most data sets have ca. 10 years of history. Patents and Clinical Trials have ca. 25 years of history.

Still need help? Contact Us Contact Us