简体   繁体   English

熊猫:应用搜索功能

[英]Pandas: Apply search function

I am trying to reproduce some neat forest plots on my own data. 我正在尝试根据自己的数据重现一些整洁的森林图 However, I'm stuck at this function, and can't for the life of me figure out what its supposed to do. 但是,我只能使用此功能,无法终生搞清楚它应该做什么。

I am trying to make the following code work on my data: 我正在尝试使以下代码对我的数据起作用:

def create_smry(trc, data, pname='subject'):
    ''' Conv fn: create trace summary for sorted forestplot '''

    dfsm = pm.df_summary(trc).reset_index()
    dfsm.rename(columns={'index':'featval'}, inplace=True)

    print(dfsm.head(n=5))

    dfsm = dfsm.loc[dfsm['featval'].apply(
        lambda x: re.search('{}__[0-9]+'.format(pname), x) is not None)]

    dfsm.set_index(dfs[pname].unique(), inplace=True)
    dfsm.sort('mean', ascending=True, inplace=True)
    dfsm['ypos'] = np.arange(len(dfsm))

    return dfsm

where the print returns: 打印返回的位置:

  featval      mean        sd  mc_error   hpd_2.5  hpd_97.5
0    mu_a -0.008913  0.011715  0.000613 -0.029139  0.014329
1    mu_b  0.003252  0.000271  0.000015  0.002698  0.003765
2    a__0 -0.065255  0.024315  0.001168 -0.113708 -0.018885
3    a__1 -0.081748  0.023247  0.001114 -0.124560 -0.036777
4    a__2  0.025326  0.021661  0.001024 -0.019744  0.065263

The error: 错误:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-125-2465af1d68b8> in <module>()
----> 1 dfsm_unpl_mfr = create_smry(hierarchical_trace[-333:], data, 'subject')
      2 custom_forestplot(dfsm_unpl_mfr)

<ipython-input-123-5f6828d6cf8e> in create_smry(trc, data, pname)
      8 
      9     dfsm = dfsm.loc[dfsm['featval'].apply(
---> 10         lambda x: re.search('{}__[0-9]+'.format(pname), x) is not None)]
     11 
     12     dfsm.set_index(dfs[pname].unique(), inplace=True)

~/anaconda/envs/py35/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer (pandas/_libs/lib.c:66645)()

<ipython-input-123-5f6828d6cf8e> in <lambda>(x)
      8 
      9     dfsm = dfsm.loc[dfsm['featval'].apply(
---> 10         lambda x: re.search('{}__[0-9]+'.format(pname), x) is not None)]
     11 
     12     dfsm.set_index(dfs[pname].unique(), inplace=True)

NameError: name 're' is not defined
  1. I cant figure out what re.search is, since re is not a df. 我不能弄清楚什么是re.search,因为re不是df。
  2. What does {}__[0-9]+ mean in this context? 在这种情况下, {}__[0-9]+是什么意思?

Since the input is quite complicated, I can't provide a minimal working example. 由于输入非常复杂,因此我无法提供一个简单的示例。

After import of regex: 导入正则表达式后:

import re

def create_smry(trc, data, pname='subject'):
    ''' Conv fn: create trace summary for sorted forestplot '''

    dfsm = pm.df_summary(trc).reset_index()
    dfsm.rename(columns={'index':'featval'}, inplace=True)

    print(dfsm.head(n=10))

    dfsm = dfsm.loc[dfsm['featval'].apply(
        lambda x: re.search('{}__[0-90]+'.format(pname), x) is not None)]

    print(dfsm.head(n=10))

    dfsm.set_index(data[pname].unique(), inplace=True)
    dfsm.sort_values('mean', ascending=True, inplace=True)
    dfsm['ypos'] = np.arange(len(dfsm))

    print(dfsm.head(n=15))

    return dfsm

which returns 哪个返回

  featval      mean        sd  mc_error   hpd_2.5  hpd_97.5
0   b0_mu -0.022521  0.010266  0.000597 -0.042222 -0.003072
1   b1_mu  0.003220  0.000256  0.000014  0.002742  0.003700
2   b2_mu  0.024366  0.005288  0.000292  0.014786  0.035139
3   b3_mu  0.008563  0.004393  0.000243  0.000634  0.017385
4   b0__0 -0.078060  0.025093  0.001208 -0.121480 -0.024921
5   b0__1 -0.097636  0.024500  0.001413 -0.144801 -0.052600
6   b0__2  0.009216  0.024381  0.001229 -0.038927  0.052254
7   b0__3  0.024541  0.025525  0.001399 -0.025824  0.070295
8   b0__4 -0.069331  0.020887  0.001057 -0.106392 -0.024169
9   b0__5 -0.065629  0.024787  0.001178 -0.111582 -0.019849
Empty DataFrame
Columns: [featval, mean, sd, mc_error, hpd_2.5, hpd_97.5]
Index: []

If I block out the re.search and simply plot (also dont try to change the index, I get a plot: 如果我屏蔽了re.search并只是作图(也不要尝试更改索引,我会得到一个图:

在此处输入图片说明

However, re.search was not employed correctly, so all y-values from trc fra plotted. 但是,由于没有正确地进行研究,因此来自trc fra的所有y值都绘制了出来。

EDIT : Ended up using 编辑 :最终使用

dfsm['featidx'] = dfsm['featval'].apply(lambda x: any(pd.Series(x).str.contains(feat)))

since I could not figure out regex. 因为我不知道正则表达式。

I cant figure out what re.search is, since re is not a df.

re is a regex library to perform regex actions on a string. re是一个正则表达式库,用于对字符串执行正则表达式操作。 You need to call import re in the header or python file to use it. 您需要在标头或python文件中调用import re才能使用它。

What does {}__[0-9]+ mean in this context?

It is a regex pattern, re.search says, scan through string looking for a location where this regular expression ( {}__[0-9]+ ) produces a match, and return a corresponding match object . re.search说,这是一个正则表达式模式,在字符串中扫描以查找该正则表达式( {}__[0-9]+ )产生匹配项的位置,然后返回相应的match对象。

For more info about the library : ' Regex Documentation ' 有关该库的更多信息:' Regex Documentation '

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM