[英]Is there a more concise way of taking the mean of multiple variables based on a specific sub-string from a string
I have variables associated with a name that i want to take the mean of, based on its MainName.我有一个与名称相关联的变量,我想根据它的 MainName 取其平均值。 Noting that i have more than two MainNames as opposed to the example below, and would look messy doing all of it.请注意,与下面的示例相反,我有两个以上的 MainNames,并且在执行所有操作时看起来很混乱。 So i was wondering if anyone could make this more concise?所以我想知道是否有人能让这更简洁? Thanks in advance!提前致谢!
fullname = ['MainName1,subname1','MainName1,subname2','MainName2,subname1','MainName2,subname2']
var1 = [1,5,9,4]
var2 = [2,6,1,5]
var3 = [3,7,2,6]
var4 = [4,8,3,7]
vars = pd.DataFrame(np.column_stack([fullname,var1,var2,var3,var4]))
vars = vars.set_index('fullname')
meanvars = [(allvars[allvars.index.str.contains('MainName1')]).mean(),
(allvars[allvars.index.str.contains('MainName2')]).mean()]
MainName = ['MainName1','MainName2']
Final = pd.DataFrame(np.column_stack([MainName,meanvars]))
You can use str.extract
for get substrings with joined substrings from list joined by |
您可以使用str.extract
从由|
连接的列表中获取带有连接子字符串的子字符串|
for regex OR
passed to groupby
with aggregating mean
:对于正则表达式OR
通过聚合mean
传递给groupby
:
fullname = ['MainName1,subname1','MainName1,subname2',
'MainName2,subname1','MainName2,subname2']
var1 = [1,5,9,4]
var2 = [2,6,1,5]
var3 = [3,7,2,6]
var4 = [4,8,3,7]
df = pd.DataFrame(np.column_stack([var1,var2,var3,var4]), index=fullname)
print (df)
0 1 2 3
MainName1,subname1 1 2 3 4
MainName1,subname2 5 6 7 8
MainName2,subname1 9 1 2 3
MainName2,subname2 4 5 6 7
L = ['MainName1','MainName2']
idx = df.index.str.extract('('+ '|'.join(L) + ')', expand=False)
print (idx)
Index(['MainName1', 'MainName1', 'MainName2', 'MainName2'], dtype='object')
df = df.groupby(idx).mean()
print (df)
0 1 2 3
MainName1 3.0 4.0 5.0 6.0
MainName2 6.5 3.0 4.0 5.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.