Python数据分析 知识量:13 - 56 - 232
数据分组是在行方向上不断拆分;而数据透视表则是在行和列方向上同时拆分。
在Python中,通过pivot_table()函数来实现数据透视表的功能,其主要的参数说明如下:
data 用于设定需要操作的DataFrame对象。
values 用于设置被计算的数据项,是可选的。
index 行分组键。
columns 列分组键。
aggfunc 用于设置对values的计算类型。
fill_value 用于设定缺省值,默认为None。
dropna 用于设置是否删除缺失值,默认为True,不删除。
margins 用于设置是否添加行/列的总计,默认为False,不添加。
margins_name 用于设置margins 行/列的名称,默认为“All”。
接下来看一个简单的示例:
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam_new.xlsx") print(df,'\n') print(pd.pivot_table(df,values='Chinese',columns='Sex',index='Age',aggfunc='mean'))
运行结果为:
Name Sex Age Chinese English Math 0 Noah male 15 90 50 66 1 Emma female 16 56 56 55 2 Bob male 15 99 84 89 3 Olivia female 17 86 87 44 4 Jeff male 16 48 87 65 5 Liam male 14 55 88 69 6 Sophia female 15 90 66 96 7 Isabella female 16 66 85 55 Sex female male Age 14 NaN 55.0 15 90.0 94.5 16 61.0 48.0 17 86.0 NaN
默认情况下,数据透视表没有合计列,可以将margins设为True来添加合计列。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam_new.xlsx") print(df,'\n') print(pd.pivot_table(df,values='Chinese',columns='Sex',index='Age',aggfunc='mean',margins=True))
运行结果为:
Name Sex Age Chinese English Math 0 Noah male 15 90 50 66 1 Emma female 16 56 56 55 2 Bob male 15 99 84 89 3 Olivia female 17 86 87 44 4 Jeff male 16 48 87 65 5 Liam male 14 55 88 69 6 Sophia female 15 90 66 96 7 Isabella female 16 66 85 55 Sex female male All Age 14 NaN 55.0 55.000000 15 90.0 94.5 93.000000 16 61.0 48.0 56.666667 17 86.0 NaN 86.000000 All 74.5 73.0 73.750000
合计列的名称默认是All,可以通过参数margins_name进行重新命名。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam_new.xlsx") print(df,'\n') print(pd.pivot_table(df,values='Chinese',columns='Sex',index='Age',aggfunc='mean',margins=True, margins_name='合计'))
运行结果为:
Name Sex Age Chinese English Math 0 Noah male 15 90 50 66 1 Emma female 16 56 56 55 2 Bob male 15 99 84 89 3 Olivia female 17 86 87 44 4 Jeff male 16 48 87 65 5 Liam male 14 55 88 69 6 Sophia female 15 90 66 96 7 Isabella female 16 66 85 55 Sex female male 合计 Age 14 NaN 55.0 55.000000 15 90.0 94.5 93.000000 16 61.0 48.0 56.666667 17 86.0 NaN 86.000000 合计 74.5 73.0 73.750000
NaN表示缺失值,可以通过参数fill_value对缺失值进行填充(替换)。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam_new.xlsx") print(df,'\n') print(pd.pivot_table(df,values='Chinese',columns='Sex',index='Age',aggfunc='mean',margins=True, margins_name='合计',fill_value=0))
运行结果为:
Name Sex Age Chinese English Math 0 Noah male 15 90 50 66 1 Emma female 16 56 56 55 2 Bob male 15 99 84 89 3 Olivia female 17 86 87 44 4 Jeff male 16 48 87 65 5 Liam male 14 55 88 69 6 Sophia female 15 90 66 96 7 Isabella female 16 66 85 55 Sex female male 合计 Age 14 0.0 55.0 55.000000 15 90.0 94.5 93.000000 16 61.0 48.0 56.666667 17 86.0 0.0 86.000000 合计 74.5 73.0 73.750000
可以通过向参数aggfunc传递一个字典类型数据,来同时对数据进行不同的计算。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam_new.xlsx") print(df,'\n') print(pd.pivot_table(df,values=['Chinese','English','Math'],columns='Sex',index='Age', aggfunc={'Chinese':'mean','English':'max','Math':'min'}))
运行结果为:
Name Sex Age Chinese English Math 0 Noah male 15 90 50 66 1 Emma female 16 56 56 55 2 Bob male 15 99 84 89 3 Olivia female 17 86 87 44 4 Jeff male 16 48 87 65 5 Liam male 14 55 88 69 6 Sophia female 15 90 66 96 7 Isabella female 16 66 85 55 Chinese English Math Sex female male female male female male Age 14 NaN 55.0 NaN 88.0 NaN 69.0 15 90.0 94.5 66.0 84.0 96.0 66.0 16 61.0 48.0 85.0 87.0 55.0 65.0 17 86.0 NaN 87.0 NaN 44.0 NaN
以上示例中,实现了对Chinese列求平均值;对English列求最大值;对Math列求最小值。
Copyright © 2017-Now pnotes.cn. All Rights Reserved.
编程学习笔记 保留所有权利
MARK:3.0.0.20240214.P35
From 2017.2.6