文章详情|pandas pivot_table 数据透视表

pandas pivot_table 数据透视表 所属分类 python 浏览量 717
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large"],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                   "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

t1 = pd.pivot_table(df, values='D', index=['A', 'B'],columns=['C'], aggfunc=np.sum)

t2 = pd.pivot_table(df, values='D', index=['A', 'B'],columns=['C'], aggfunc=np.sum, fill_value=0)

t3 = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],aggfunc={'D': np.mean,'E': np.mean})

t4 = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],aggfunc={'D': np.mean,'E': [min, max, np.mean]})





C        large  small
A   B                
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0


C        large  small
A   B                
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6


                  D         E
A   C                        
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333



                  D   E              
               mean max      mean min
A   C                                
bar large  5.500000   9  7.500000   6
    small  5.500000   9  8.500000   8
foo large  2.000000   5  4.500000   4
    small  2.333333   6  4.333333   2




关键参数
values index columns  分别对应透视表中的 值 行 列

用户订单数据，有 订单日期 客户姓名 城市 洲 商品类别 价格 销售额 利润等维度 

data = pd.read_excel("orders.xlsx")
计算每个州销售总额和利润总额
result1 = pd.pivot_table(data,index='洲' , values = ['销售额','利润'] , aggfunc = np.sum)


计算每个洲每个城市每单平均销售量
result2 = pd.pivot_table(data,index=['洲','城市'],aggfunc=np.mean,values=['数量'])
result2.head(20)

计算每个洲的总销量和每单平均销量
result3 = pd.pivot_table(data,index=['洲'],aggfunc=[np.sum,np.mean],values=['数量'])
result3.head()

计算每个城市（行）每类商品（列）的总销售量，并汇总计算
result4 = pd.pivot_table(data,index=['城市'],columns=['商品类别'],aggfunc=[np.sum],values=['数量'],margins=True)
result4.head()




pd.pivot_table?

Signature:
pd.pivot_table(
    data: 'DataFrame',
    values=None,
    index=None,
    columns=None,
    aggfunc: 'AggFuncType' = 'mean',
    fill_value=None,
    margins: 'bool' = False,
    dropna: 'bool' = True,
    margins_name: 'str' = 'All',
    observed: 'bool' = False,
    sort: 'bool' = True,
) -> 'DataFrame'

Docstring:
Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) 
on the index and columns of the result DataFrame.

Parameters
----------
data : DataFrame

values : column to aggregate, optional   需要汇总计算的列 

index : 
column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. 
The list can contain any of the other types (except list).
Keys to group by on the pivot table index.  
If an array is passed,it is being used as the same manner as column values.


columns : 
column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. 
The list can contain any of the other types (except list).
Keys to group by on the pivot table column.  
If an array is passed,it is being used as the same manner as column values.

aggfunc : 
function, list of functions, dict, default numpy.mean
    
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names
(inferred from the function objects themselves)
If dict is passed, the key is column to aggregate and value is function or list of functions.

fill_value : 
scalar, default None
Value to replace missing values with (in the resulting pivot table,after aggregation).

margins :    是否添加行列的总计
bool, default False
Add all row / columns (e.g. for subtotal / grand totals).

dropna : bool, default True
Do not include columns whose entries are all NaN. 
If True,rows with a NaN value in any column will be omitted before computing margins.

margins_name : str, default 'All'    汇总行列的名称，默认为All
Name of the row / column that will contain the totals when margins is True.

observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.

sort : bool, default True
    Specifies if the result should be sorted.

Returns
-------
DataFrame
    An Excel style pivot table.

See Also
--------
DataFrame.pivot : Pivot without aggregation that can handle non-numeric data.
DataFrame.melt: Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
wide_to_long : Wide panel to long format. Less flexible but more user-friendly than melt.
国内大模型

python pandas 使用技巧

pandas dataframe merge

numpy 实用代码

信贷风控业务知识-001-信贷业务介绍-信贷基础指标和风险指标

信贷风控业务知识002-信贷风控架构简介