[英]Extract last specific word/value from one column and move it to the next row
I have a DataFrame like the following我有一个如下所示的 DataFrame
|Animals | Type | Year |
|Penguin AVES | Omnivore | 2015 |
|Caiman REP | Carnivore | 2018 |
|Komodo.Rep | Carnivore | 2019 |
|Blue Jay.aves | Omnivore | 2015 |
|Iguana+rep | Carnivore | 2020 |
I want to extract the last specific words (eg AVES and REP) from the values in column "Animals" and move it to the next row while keeping the values of the entire row.我想从“Animals”列的值中提取最后的特定单词(例如 AVES 和 REP),并将其移动到下一行,同时保留整行的值。 There are several specific words other than AVES and REP.除了 AVES 和 REP 之外,还有几个特定的词。 It's not very clean (as shown by the whitespace, dot, and "+" operator before the specific words).它不是很干净(如特定单词前的空格、点和“+”运算符所示)。 The expected new DataFrame would be like the following预期的新 DataFrame 将如下所示
| Animals | Type | Year |
| Penguin AVES | Omnivore | 2015 |
| AVES | Omnivore | 2015 |
| Caiman REP | Carnivore | 2018 |
| REP | Carnivore | 2018 |
| Komodo.Rep | Carnivore | 2019 |
| Rep | Carnivore | 2019 |
| Blue Jay.aves | Omnivore | 2015 |
| aves | Omnivore | 2015 |
| Iguana+rep | Carnivore | 2020 |
| rep | Carnivore | 2020 |
I was thinking of using a negative indexing to split the string, but I got confused with the lambda function for this particular issue.我正在考虑使用负索引来拆分字符串,但对于这个特定问题,我对 lambda function 感到困惑。 Any idea how I should approach this problem?知道我应该如何解决这个问题吗? Thanks in advance.提前致谢。
You can use str.extract
to get the last word ( (\w+)$
regex, but you can also use a specific list (?i)(aves|rep)$
if needed) and assign
it to replace the column, then concat
the updated DataFrame to the original one, and sort_index
with a stable method to interleave the rows:您可以使用str.extract
获取最后一个单词( (\w+)$
正则表达式,但如果需要,您也可以使用特定列表(?i)(aves|rep)$
concat
其assign
给替换列,然后连接更新后的 DataFrame 为原来的,并且sort_index
使用稳定的方法交错行:
out = (pd.concat([df, df.assign(Animals=df['Animals'].str.extract(r'(\w+)$'))])
.sort_index(kind='stable', ignore_index=True)
)
Output: Output:
Animals Type Year
0 Penguin AVES Omnivore 2015
1 AVES Omnivore 2015
2 Caiman REP Carnivore 2018
3 REP Carnivore 2018
4 Komodo.Rep Carnivore 2019
5 Rep Carnivore 2019
6 Blue Jay.aves Omnivore 2015
7 aves Omnivore 2015
8 Iguana+rep Carnivore 2020
9 rep Carnivore 2020
stack
:替代使用stack
:cols = df.columns.difference(['Animals']).tolist()
out = (df.assign(Word=df['Animals'].str.extract(r'(\w+)$'))
.set_index(cols).stack().reset_index(cols, name='Animals')
.reset_index(drop=True)[df.columns]
)
Duplicate all rows, modify the odd rows with the extracted word复制所有行,用提取的词修改奇数行
out = df.loc[df.index.repeat(2)].reset_index(drop=True)
out.loc[1::2, 'Animals'] = out.loc[1::2, 'Animals'].str.extract(r'(\w+)$', expand=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.