=================================================================================
A waterfall plot in data analysis is a visual representation of numerical data that illustrates the cumulative effect of sequentially introduced positive or negative values. It is commonly used in finance, project management, and other fields to demonstrate how different factors contribute to a final result.
In a waterfall plot:
-
Starting Point: The plot begins with a starting value or baseline.
-
Sequential Bars: Bars are then drawn representing each sequential change or contribution to the data. These bars can be either positive or negative depending on whether they add to or subtract from the starting value.
-
Cumulative Total: As each bar is added or subtracted, the cumulative total is updated, resulting in a step-like appearance resembling a waterfall.
-
End Point: The plot ends with the final cumulative total, which represents the ultimate result or outcome.
===========================================
Water fall. Code:
Output:
===========================================
This script counts how many letters in the column after removing the last letter (the counted letters are 'AY', 'A', and 'B' in this code). code1, code2, and code3:
Input:
Output:
Options:
-
Counts the occurrences of a specific substring within each string in the DataFrame:
ay_count = column2_df[df['Class'] == the_class].str.count(i).sum()
-
This line of code filters the DataFrame column2_df based on the condition df['Class'] == the_class, which means it selects rows where the corresponding value in the 'Class' column matches the_class.
-
Then, .str.count(i) is applied to each element in the filtered DataFrame. This function counts the occurrences of the substring i within each string in the DataFrame.
-
Finally, .sum() is used to sum up all the counts obtained from the previous step, resulting in the total count of occurrences of the substring i in the filtered DataFrame.
-
Counts the occurrences of an exact string match within each cell of the DataFrame:
ay_count = (column2_df[df['Class'] == the_class] == i).sum().sum()
-
This line of code filters the DataFrame column2_df based on the condition df['Class'] == the_class.
-
Then, (column2_df[df['Class'] == the_class] == i) compares each element in the filtered DataFrame with the string i, resulting in a DataFrame of boolean values where True indicates that the element matches i, and False otherwise.
-
The first .sum() sums up the boolean values along each column, counting the number of occurrences where the element matches i in each column.
-
The second .sum() sums up these counts across all columns, giving the total count of occurrences of the string i in the filtered DataFrame.
- Counts the occurrences of an exact string (mixture of letters and numbers):
ay_count = column2_df[df['Class'] == the_class].apply(lambda x: x == i).sum()
-
Sometimes the counting is not accurate because there might be variations in the strings due to factors like whitespace or case sensitivity. To handle such variations, we can preprocess the strings before comparison.
# Preprocess strings to remove leading and trailing whitespaces and make them lowercase
column2_df_processed = column2_df.str.strip().str.lower()
i_processed = i.strip().lower()
# Count occurrences of the exact processed string
ay_count = (column2_df_processed[df['Class'] == the_class] == i_processed).sum()
This modification ensures that both the strings in the DataFrame and the target string i are preprocessed to remove leading and trailing whitespaces and are converted to lowercase before comparison. This should help in accurately counting occurrences even when there are variations in the strings such as whitespace or case sensitivity.
-
The length of the x-axis can be defined by removing some strings at the end of the list which is used for x-axis. code:
# Filter the DataFrame to include only rows where the 'Class' column matches 'AY'
df_filtered = df[df.index.isin(['AY', 'A', 'B'])]
# Set font to serif
rcParams['font.family'] = 'serif'
# Plotting
plt.figure(figsize=(10, 6))
for col in df_filtered.columns:
plt.plot(df_filtered.index, df_filtered[col], marker='o', label=col)
===========================================
Plot these counts against the elements in the list after counting the occurrences of each element in your list in the "X" column of the dataframe. Code:
Input:
Output:
The script above focuses on counting specific values in a column and plotting these counts. The two lines of code involve data handling and aggregation rather than mathematical equations:
- counts = df['X'].value_counts():
- This line of code counts the occurrences of each unique value in the column 'X' of the DataFrame df. The method value_counts() is used, which returns a Series where the index consists of the unique values from the column, and the corresponding values in the Series represent the count of each unique value.
- Mathematical Aspect: This line performs an aggregation operation, specifically counting. While not a mathematical equation like y = mx + b, it's a form of summation where each unique entry is summed to provide its total occurrences in the dataset.
- y_values = [counts.get(item, 0) for item in my_list]:
- This line creates a list y_values where each element is the count of occurrences of each item in my_list from the counts Series obtained in the previous line. If an item from my_list does not appear in counts, the method .get() returns 0 for that item.
- Mathematical Aspect: Here, the line uses list comprehension to map each item in my_list to its corresponding count in counts, with a default of 0 if the item isn't found. This is an application of a conditional retrieval or mapping, which can be seen as a function mapping
𝑓
(
𝑥
)
f(x) from a set of predefined keys (my_list) to their values (counts or zero).
===========================================
AAA. Plot these counts against the elements in the list after counting the occurrences of each element in your list in the "X" column of the dataframe. On the hand, we insert an element "Begin" at the beginning of the list and set its count to the total number of rows in the dataframe. Code:
Input:
Output:
===========================================
AAA. Plot percetage against the elements in the list after counting the occurrences of each element in your list in the "X" column of the dataframe. Code:
Input:
Output:
This plot above is obtained by following the rule: Assuming original y_values = [y0_0, y0_1, y0_2, ..., y0_n], then it converts the y_values list to the new y_values list with the equations: y1_0 = 100*y0_0/y0_0, y1_1 = y1_0 - 100 *y0_1/y0_0, y1_2 = y1_1 - 100 *y0_2/y0_0, y1_3 = y1_2 - 100 *y0_3/y0_0, ..., y1_(n-1) = y1_(n-2) - 100 *y0_(n-1)/y0_0, y1_n = y1_(n-1) - 100 *y0_n/y0_0.
If there are multiople csv file with the same data structure, then this code can be used for bar plot. Furthermore, this code can be used when we have multiple dataframes instead of multople csv files.
===========================================
AAA. Plot these counts against the elements in the list after counting the occurrences of each element in your list in the "X" column of the dataframe. On the hand, we insert an element "Begin" at the beginning of the list and set its count to the total number of rows in the dataframe. And, then use the difference of the counts. Code:
Input:
Output:
===========================================
AAA. Plot the percentage of counts. code:
Input:
Output:
The code can be modified and used for lineplot for single column, lineplot from multiple columns, lineplot from multiple columns with the same data structures. lineplot from multiple columns with different data structures.
===========================================
|