Python Automation and Machine Learning for EM and ICs

An Online Book, Second Edition by Dr. Yougui Liao (2024)

Python Automation and Machine Learning for EM and ICs - An Online Book

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

.groupby('...')['...']

The groupby command in pandas is a remarkably powerful tool for data analysis, allowing for segmentation of data into groups and performing operations on these groups independently.

  • Aggregating with Custom Functions
    We can apply our own functions to groups. This is particularly useful for more complex aggregations beyond the standard sum, mean, etc.
  •  

    grouped = df.groupby('column_name')
    grouped.agg(lambda x: (x.max() - x.min())) # Custom range function

  • Transforming Data
    The transform method returns a DataFrame with the same shape as the input. It's useful for normalization or filling NA values within groups.
  •  

    normalize = lambda x: (x - x.mean()) / x.std()
    df['normalized'] = df.groupby('group').transform(normalize)

 

For instance, this method is used to aggregate duplicates in columns of data.

  • Filtering Groups
    Using filter, we can remove data that does not meet certain criteria. For example, keeping groups with a certain number of elements.
  •  

    df_filtered = df.groupby('group').filter(lambda x: len(x) > 1)

  • Applying Multiple Aggregation Functions
    We can perform multiple aggregation operations in a single step using agg() with a list of operations or a dictionary mapping columns to operations.
  •  

    df.groupby('group').agg({'data1': ['mean', 'min'], 'data2': 'sum'})

  • Grouping by Index Levels
    If the DataFrame has a MultiIndex, we can group by one of the levels.
  •  

    df.groupby(level=0).sum()

  • Named Aggregation
    To avoid MultiIndex on columns after aggregation, we can use named aggregation to produce flat columns.
  •  

    df.groupby('group').agg(
    mean_data1=('data1', 'mean'),
    sum_data2=('data2', 'sum')
    )

  • Grouping by Functions or Mappings
    It is not limited to only grouping by columns. We can group by a function of the index or a mapping (like a dictionary) that provides group labels.
  •  

    df.groupby(lambda x: 'odd' if x % 2 else 'even')

  • Grouping by Time Periods
    When working with time series data, we can group by time periods (like month, quarter) using Grouper.
  •  

    df.groupby(pd.Grouper(key='date', freq='M')).sum() # Group by month

  • Categorical Data Optimization
    Grouping by categorical columns is more efficient than grouping by object columns, so consider converting strings to categories if grouping by string fields.
  •  

    df['column'] = df['column'].astype('category')

  • SQL-like Window Functions
    Pandas supports window functions, allowing for operations like running totals or moving averages within groups.
     

    df.groupby('group')['data'].expanding().mean()

  • Group the rows of which the column 'Name', 'Z', and 'Grades' are empty (code):
     

===========================================

.groupby('...')['...']. Code:

         .groupby('...')['...']                                                                 

 

 Input:

 

Output:    

         

===========================================

Group the DataFrame by columns "X", "Y", and "Z" with a special rule for column "X" (grouping based on the portion of the strings before the dots "."). Code:

 

Input:

Output:

In this script, we add a step to preprocess the "X" column by extracting the portion before the dot and storing it in a new column. Then, we use this new column along with "Y" and "Z" for the grouping.