Method Chaining in Pandas || Python


      import pandas as pd

      df = (pd.read_csv('input_path/input_file.csv')
        .query('column_a == `target value`')
        .rename(columns={'column_y': 'column_b'})
        .groupby(['column_a','column_b'], as_index=False).agg({'column_c': 'sum'})
        .assign(new_column = lambda x: x['column_c'] * 100)
        .drop(columns=['column_c'])
        .merge(df_other, left_on='column_a', right_on='column_a', how='left')
        .fillna(0)
        .sort_index(axis=1)
        .to_csv('output_path/output_file.csv', index=False))
      

Method chaining provides a pattern for performing operations on a pandas dataframe that emphasizes continuity and a logical flow of execution. Rather than having to name variables at each step, it allows users to chain methods one after the other, each operating on the returned dataframe as the first argument. This substantially increases the readability of the code and typically has better performance than conventional methods.


In this example, I have included my favorite "chainable" methods in pandas as well as a short description of how to apply them, below.


query
filters rows
rename
renames columns
groupby
splits a dataframe, applies a function and combines the results
assign
creates a new column
drop
drops column(s) from the dataframe
merge
merges dataframes with a database-style join
fillna
fills na values with 0
sort_index
reorders the columns alphabetically