diary | visual numbers


      import pandas as pd

      df = (pd.read_csv('input_path/input_file.csv')
        .query('column_a == `target value`')
        .rename(columns={'column_y': 'column_b'})
        .groupby(['column_a','column_b'], as_index=False).agg({'column_c': 'sum'})
        .assign(new_column = lambda x: x['column_c'] * 100)
        .drop(columns=['column_c'])
        .merge(df_other, left_on='column_a', right_on='column_a', how='left')
        .fillna(0)
        .sort_index(axis=1)
        .to_csv('output_path/output_file.csv', index=False))

Method chaining provides a pattern for performing operations on a pandas dataframe that emphasizes continuity and a logical flow of execution. Rather than having to name variables at each step, it allows users to chain methods one after the other, each operating on the returned dataframe as the first argument. This substantially increases the readability of the code and typically has better performance than conventional methods.

In this example, I have included my favorite "chainable" methods in pandas as well as a short description of how to apply them, below.

query

filters rows

rename

renames columns

groupby

splits a dataframe, applies a function and combines the results

assign

creates a new column

drop

drops column(s) from the dataframe

merge

merges dataframes with a database-style join

fillna

fills na values with 0

sort_index

reorders the columns alphabetically

Method Chaining in Pandas || Python