[英]How to extract information from pandas dataframe column
I have the dataframe below and I want to extract some information from column A and then create other columns to add them based on their types.我有下面的数据框,我想从 A 列中提取一些信息,然后创建其他列以根据它们的类型添加它们。 Below is an example to illustrate this.
下面是一个例子来说明这一点。
In [0]: df
Out[0]:
A
0 1258GA 25/01/20 TABLE 090626 038272
1 GOODIES 762088 A714816
2 TABLE AA88547 734963 GOODIES
3 WATER 02/450 FROM TOMORROW 48246
4 02H12 ALSCA 00548246B GOODIES
And I want to have the result below.我想得到下面的结果。
In [1]: df
Out[1]:
A Category Date Hour
0 1258GA 25/01/20 TABLE 090626 038272 TABLE 25/01/20
1 GOODIES 762088 A714816 GOODIES
2 TABLE AA88547 734963 GOODIES TABLE GOODIES
3 WATER 02/450 FROM TOMORROW 48246 WATER
4 02H12 ALSCA 00548246B GOODIES GOODIES 02H12
I've tried many things but haven't got that result我尝试了很多东西,但没有得到那个结果
Maybe this helps:也许这有帮助:
df['A'].str.findall(r'\b[A-Z]+\b').str.join(' ')
0 TABLE
1 GOODIES
2 TABLE GOODIES
3 WATER FROM TOMORROW
4 ALSCA GOODIES
You can certainly do that using Series.str
methods,你当然可以使用
Series.str
方法做到这Series.str
,
Series.str.extract()
returns: Series.str.extract()
返回:Extract capture groups in the regex pat as columns in a DataFrame.
将正则表达式中的捕获组提取为 DataFrame 中的列。
For each subject string in the Series, extract groups from the first match of regular expression pat.
对于系列中的每个主题字符串,从正则表达式 pat 的第一个匹配项中提取组。
Find all occurrences of pattern or regular expression in the Series/Index.
查找系列/索引中所有出现的模式或正则表达式。
Here is the code snippet,这是代码片段,
EDIT:编辑:
df["Category"] = df['A'].str.findall(r"(\b[A-Za-z]+\b)").str.join(' ')
df["Date"] = df['A'].str.extract(r"(\b[0-9]+/[0-9]+/[0-9]+\b)")
df["Hour"] = df['A'].str.extract(r"(\b[0-9]+H[0-9]+\b)")
And output will be,输出将是,
A Category Date Hour
0 1258GA 25/01/20 TABLE 090626 038272 TABLE 25/01/20 NaN
1 GOODIES 762088 A714816 GOODIES NaN NaN
2 TABLE AA88547 734963 GOODIES TABLE GOODIES NaN NaN
3 WATER 02/450 FROM TOMORROW 48246 WATER FROM TOMORROW NaN NaN
4 02H12 ALSCA 00548246B GOODIES ALSCA GOODIES NaN 02H12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.