Python数据分析 知识量:13 - 56 - 232
数据计数用于计算某个值在数据对象中出现的次数,使用value_counts()函数来实现。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam.xlsx") print(df,'\n') print(df['Chinese'].value_counts())
运行结果为:
Name Sex Chinese English Math 0 Noah male 90 50 66 1 Emma female 56 56 55 2 Noah male 90 50 66 3 Olivia female 86 87 44 4 Liam male 55 88 69 5 Sophia female 90 66 96 6 Liam male 55 88 69 7 Isabella female 66 85 55 90 3 55 2 56 1 66 1 86 1 Name: Chinese, dtype: int64
以上示例中对Chinese列的每个值出现的次数进行了计数,其中90出现了3次;55出现了2次;其他的各出现了1次。
计数的结果默认是按值降序排列。实际上,结果的排序涉及到value_counts()函数的参数sort,它决定了结果是否进行排序,默认情况下sort=True。也可以将其设为False,即不排序。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam.xlsx") print(df,'\n') print(df['Chinese'].value_counts(sort=False))
运行结果为:
Name Sex Chinese English Math 0 Noah male 90 50 66 1 Emma female 56 56 55 2 Noah male 90 50 66 3 Olivia female 86 87 44 4 Liam male 55 88 69 5 Sophia female 90 66 96 6 Liam male 55 88 69 7 Isabella female 66 85 55 56 1 90 3 66 1 86 1 55 2 Name: Chinese, dtype: int64
在进行排序的情况下(即sort=True或省略时),value_counts()函数的参数ascending决定结果是按升序还是降序排列。默认ascending=False(降序),可以将其设为True,使计数结果按升序排列。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam.xlsx") print(df,'\n') print(df['Chinese'].value_counts(ascending=True))
运行结果为:
Name Sex Chinese English Math 0 Noah male 90 50 66 1 Emma female 56 56 55 2 Noah male 90 50 66 3 Olivia female 86 87 44 4 Liam male 55 88 69 5 Sophia female 90 66 96 6 Liam male 55 88 69 7 Isabella female 66 85 55 56 1 66 1 86 1 55 2 90 3 Name: Chinese, dtype: int64
很显然,如果参数sort=False,即不排序情况下,ascending的设置将无效。
除了简单的计数,还可以使用value_counts()函数查看某个值的计数的占比情况,也就是某个值出现的次数占全部值出现次数的比例。需要向value_counts()函数传递参数normalize=True。
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam.xlsx") print(df,'\n') print(df['Chinese'].value_counts(normalize=True))
运行结果为:
Name Sex Chinese English Math 0 Noah male 90 50 66 1 Emma female 56 56 55 2 Noah male 90 50 66 3 Olivia female 86 87 44 4 Liam male 55 88 69 5 Sophia female 90 66 96 6 Liam male 55 88 69 7 Isabella female 66 85 55 90 0.375 55 0.250 56 0.125 66 0.125 86 0.125 Name: Chinese, dtype: float64
以上示例中,90出现了3次,全部值共出现了8次(8行数据),占比为3/8=0.375。
value_counts()函数的参数dropna决定了在计数时是否删除缺失值,默认情况下dropna=True(删除)。下面的示例包含了计数时删除和不删除缺失值的情况:
import pandas as pd df=pd.read_excel(r"D:\PythonTestFile\exam_nan.xlsx") print(df,'\n') print(df['Chinese'].value_counts(),'\n') # 默认删除缺失值 print(df['Chinese'].value_counts(dropna=False)) # 不删除缺失值
运行结果为:
Name Sex Chinese English Math 0 Noah male 90.0 50.0 66.0 1 Emma NaN 56.0 56.0 55.0 2 NaN NaN NaN NaN NaN 3 Olivia female 86.0 87.0 NaN 4 Liam male 55.0 NaN 69.0 5 Sophia female 90.0 66.0 96.0 6 Liam male 55.0 NaN 69.0 7 Isabella female NaN 85.0 55.0 90.0 2 55.0 2 56.0 1 86.0 1 Name: Chinese, dtype: int64 90.0 2 55.0 2 NaN 2 56.0 1 86.0 1 Name: Chinese, dtype: int64
Copyright © 2017-Now pnotes.cn. All Rights Reserved.
编程学习笔记 保留所有权利
MARK:3.0.0.20240214.P35
From 2017.2.6