首页  

多个ETF行情数据join,协方差 相关系数计算     所属分类 quant 浏览量 467
数据格式 
time,open,close,low,high,volume
2019-08-12,3.708,3.756,3.701,3.758,180115577



import pandas as pd
sh510300file = "/Users/dugang/data/sh510300.csv"
sh510900file = "/Users/dugang/data/sh510900.csv"
sh513180file = "/Users/dugang/data/sh513180.csv"
sh588000file = "/Users/dugang/data/sh588000.csv"
sz159949file = "/Users/dugang/data/sz159949.csv"
sh518880file = "/Users/dugang/data/sh518880.csv"

sh510300df = pd.read_csv(sh510300file)
sh510900df = pd.read_csv(sh510900file)
sh513180df = pd.read_csv(sh513180file)
sh588000df = pd.read_csv(sh588000file)
sz159949df = pd.read_csv(sz159949file)
sh518880df = pd.read_csv(sh518880file)


根据time列合并join , 保留time和 close 列 

df = pd.merge(sh510300df[["time","close"]],sh588000df[["time","close"]],on='time')
df.columns=["time","sh510300","sh588000"]
df = pd.merge(df,sh510900df[["time","close"]],on='time')
df.columns=["time","sh510300","sh588000","sh510900"]

df = pd.merge(df,sh513180df[["time","close"]],on='time')
df.columns=["time","sh510300","sh588000","sh510900","sh513180"]

df = pd.merge(df,sz159949df[["time","close"]],on='time')
df.columns=["time","sh510300","sh588000","sh510900","sh513180","sz159949"]

df = pd.merge(df,sh518880df[["time","close"]],on='time')
df.columns=["time","sh510300","sh588000","sh510900","sh513180","sz159949","sh518880"]

df.head()

    time	sh510300	sh588000	sh510900	sh513180	sz159949	sh518880
0	2021-05-25	5.327	1.408	1.150	1.003	1.398	3.812
1	2021-05-26	5.328	1.404	1.156	1.004	1.385	3.852
2	2021-05-27	5.347	1.437	1.153	1.005	1.396	3.829
3	2021-05-28	5.332	1.430	1.150	0.991	1.403	3.800
4	2021-05-31	5.338	1.484	1.149	0.998	1.443	3.828



df.describe()

# 计算协方差矩阵
df.cov()

# 计算相关系数矩阵
df.corr()

            sh510300	sh588000	sh510900	sh513180	sz159949	sh518880
sh510300	1.000000	0.943971	0.905710	0.930391	0.955516	-0.703400

sh510300 和 sh518880(黄金ETF) 相关系数  	-0.703400   负相关

# 只取今年以来的数据
df2 = df[df['time'] >= '2023-01-01']

# 取第一列 到 最后一列 ,去掉time列
df3 = df2.iloc[:,1:]
# 取第一行
firstRow = df3.iloc[0]

# 每一列都除以 第一个值  
df4 = df3.div(firstRow)

    sh510300	sh588000	sh510900	sh513180	sz159949	sh518880
393	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
394	1.001266	0.993157	1.027127	1.043321	0.988338	1.003762
395	1.020258	1.002933	1.043157	1.059567	1.023324	1.003511
396	1.024310	1.008798	1.038224	1.043321	1.034985	0.996238
397	1.030641	1.008798	1.048089	1.061372	1.040816	1.007023

# 绘图展示 对比
df4.plot()

上一篇     下一篇
pandas dataframe 计算收益率

java量化交易技术资料

开源授权协议

《趋势永存:打败市场的动量策略》笔记

回归问题的评价指标和知识点

海龟交易系统