Agents for Data

Merge Parquet files online

Merge multiple Parquet files into a single file. Combine data from different sources quickly and easily.

Files

Click anywhere to select filesor drag and drop files here
Accepts Parquet files (.parquet)
Trusted by over 60,000 every month

Parquet Merge Features

Multiple File Merging
Combine any number of files into a single consolidated file
Lightning-Fast Performance
Merge even large files instantly with optimized processing
Streamlined File Merging
Efficiently combine multiple files into a unified dataset
SQL-Powered Merging
Use SQL queries for advanced merging with custom join conditions
AI-Powered Assistance
Describe your merging needs in plain English for complex scenarios
Export Merged Data
Save your combined results in various formats

How to merge Parquet files in Python

Here are three effective ways to merge multiple Parquet files in Python using different libraries. Each approach has its own advantages depending on your specific needs and file sizes.

Merging Parquet files with Pandas

Pandas provides a straightforward approach for merging files and works well for most common data tasks:

First, let's install pandas if you haven't already:

bash

Now we can load your parquet files into dataframes:

python

Let's load your first file:

python

And your second file:

python

Great! Now we can merge the dataframes using the concat function:

python

Finally, let's save your newly merged data to a file:

python

Need to merge more than two files? No problem! Just add them to the list in the concat function:

python

Merging Parquet files with DuckDB

DuckDB is an in-process SQL OLAP database that's perfect for larger files and analytical workloads:

Let's start by installing DuckDB for Python:

bash

Now we'll import the library and create a connection:

python

Here's a simple DuckDB query that will merge your parquet files using UNION ALL:

python

Just run this query to perform the merge:

python

Got more than two files? Simply add more UNION ALL statements like this:

python

What's great about DuckDB is that it's incredibly efficient for large files - it processes data in a columnar format and can handle files that don't fit in memory. Perfect for those bigger merging jobs!

Merging Parquet files with ClickHouse

ClickHouse is a high-performance column-oriented database system that's excellent for large-scale data processing:

Let's begin by installing the ClickHouse Connect library for Python:

bash

Now we'll import the library and create a client connection:

python

Here's how you can merge your files using a single UNION ALL query:

python

Then export your merged data to a file:

python

Need to merge more than two files? Just add more UNION ALL statements like this:

python

Want to skip the intermediate table? You can merge directly to a file in one step:

python

ClickHouse really shines when you're working with massive datasets - it's a powerful columnar database that processes large volumes of data lightning fast, making it perfect for merging even the largest files.

Frequently Asked Questions

More Parquet Tools