使用 Vaex 从数据框中的列中提取字典值

Question

I applied on my dataframe the next command我在我的 dataframe 上应用了下一个命令

df['date_article'] = df.pagePath.str.extract_regex(pattern='(?P<digit>/\d{4}/\d{2}/\d{2}/)') df['date_article'] = df.pagePath.str.extract_regex(pattern='(?P<digit>/\d{4}/\d{2}/\d{2}/)')

And this created the column 'date_article'这创建了“date_article”列

pagePath页面路径	date_article date_article
'/empresas/2021/10/22/tiendas-no-participan-buen' '/empresas/2021/10/22/tiendas-no-participan-buen'	{'digit': '/2021/10/22/'} {'数字'：'/2021/10/22/'}
'/finanzas-personales/2021/10/22/pueden-cobrar-c '/finanzas-personales/2021/10/22/pueden-cobrar-c	{'digit': '/2021/10/22/'} {'数字'：'/2021/10/22/'}

Now I want to left only the date in 'date_article'.现在我只想在“date_article”中留下日期。

Expected output预期 output

pagePath页面路径	date_article date_article
'/empresas/2021/10/22/tiendas-no-participan-buen' '/empresas/2021/10/22/tiendas-no-participan-buen'	'/2021/10/22/' '/2021/10/22/'
/finanzas-personales/2021/10/22/pueden-cobrar-c /finanzas-personales/2021/10/22/pueden-cobrar-c	'/2021/10/22/' '/2021/10/22/'

I tried many things but nothing seems to work我尝试了很多东西，但似乎没有任何效果

Thank you in advance for help预先感谢您的帮助

Answer 1

How about the following:以下情况如何：

df['date_article'] = df.apply(lambda x: x['digit'], axis=1)

Answer 2

It appears that extract_regex returns a struct series:看来extract_regex返回一个结构系列：

Extract substrings defined by a regular expression using Apache Arrow (Google RE2 library).使用 Apache Arrow（Google RE2 库）提取由正则表达式定义的子字符串。

Parameters参数
pattern (str) – A regular expression which needs to contain named capture groups, eg 'letter' and 'digit' for the regular expression
'(?P[ab])(?Pd)'. '(?P[ab])(?Pd)'。

Returns退货
an expression containing a struct with field names corresponding to capture group identifiers.

So you will need to extract the field you want out of the struct.所以你需要从结构中提取你想要的字段。 I'm not a Vaex expert but maybe something like:我不是 Vaex 专家，但可能类似于：

struct_series = df.pagePath.str.extract_regex(pattern='(?P<digit>/\d{4}/\d{2}/\d{2}/)')
df['date_article'] = struct_series.struct.get('digit')

Answer 3

Use:利用：

df = pd.DataFrame({'date_article':[{'digit': '/2021/10/22/'}]})
df['date_article'] = df['date_article'].apply(lambda x: x['digit'])

使用 Vaex 从数据框中的列中提取字典值

问题描述

3 个解决方案

解决方案1
0 2022-02-02 05:29:05

解决方案2
0 2022-02-02 05:47:22

解决方案3
0 2022-02-02 07:59:53

使用 Vaex 从数据框中的列中提取字典值

问题描述

3 个解决方案

解决方案1 0 2022-02-02 05:29:05

解决方案2 0 2022-02-02 05:47:22

解决方案3 0 2022-02-02 07:59:53

解决方案1
0 2022-02-02 05:29:05

解决方案2
0 2022-02-02 05:47:22

解决方案3
0 2022-02-02 07:59:53