Transform → My Favorite Pandas Function || Python


      import pandas as pd

      df = (pd.read_csv('sales_by_customer.csv')
        .assign(reg_sales = lambda x : x.groupby(['region'])['sales'].transform('sum'))
        .assign(customer_pct_region = lambda x : x['sales'] / x['reg_sales']))
      

Pairing the group by and transform functions in pandas allows users to calculate values at many levels of granularity all within one dataframe!


In this example, I have a table of sales transactions by customer, but want to see how each customer's sales contribute to their particular region. This technique avoids the hassle of having to create a separate dataframe that sums sales by region only to join that dataframe back to the original dataframe by customer.