从 pandas - python 中的现有 df 创建新 df

Question

What should be the optimized pandas command to create a new data frame from existing data frame that have only 1 column named val with the following transformation.什么应该是优化的 pandas 命令从现有数据帧创建一个新数据帧，该数据帧只有 1 个名为val的列，并进行以下转换。

Input:输入：

1_2_3
1_2_3_4
1_2_3_4_5

Output: Output：

2
2_3
2_3_4

Remove everything till first underscore (including _) and also remove everything after last _ (including _)删除直到第一个下划线（包括 _）的所有内容，并删除最后一个 _ 之后的所有内容（包括 _）

Answer 1

You can use str.replace with a regex that matches characters up to and including the first _ and from the last _ to the end of string, replacing both those parts with nothing:您可以将str.replace与匹配字符的正则表达式一起使用，直到并包括第一个_和从最后一个_到字符串的末尾，将这两个部分都替换为空：

df['val'] = df['val'].str.replace('^[^_]*_(.*)_[^_]*$', r'\1')

Output: Output：

If you want that single column in a new dataframe, you can convert it to one using to_frame :如果您想要新 dataframe 中的单列，您可以使用to_frame将其转换为一列：

df2 = df['val'].str.replace('^[^_]*_(.*)_[^_]*$', r'\1').to_frame()

Answer 2

Another way with str slicing after split:拆分后 str 切片的另一种方法：

df['val'].str.split("_").str[1:-1].str.join("_")

0        2
1      2_3
2    2_3_4

Answer 3

Split the string by the charcters between start of string r1 and r2 end of string按字符串开头 r1 和字符串结尾 r2 之间的字符拆分字符串

where r1=digit_ and r2=_digit其中r1=digit_和r2=_digit

df.a.str.split('(?<=^\d\_)(.*?)(?=\_\d+$)').str[1]

Answer 4

You can find the first and the last _ using str.find and str.rfind and then you can get the substring from it.您可以使用str.find和str.rfind找到第一个和最后一个_ ，然后您可以从中获取 substring。

df['val'] = [x[x.find('_')+1:x.rfind('_')] for x in df['val']]

Output: Output：

Answer 5

You can do it using the replace method您可以使用替换方法来做到这一点

df.vals = df.vals.str.replace(r'^1_', '').str.replace(r'_\d$', '')

I'm passing 2 regex, first one finds the substring 1_ and replaces it with empty string, the second one finds substrings with an underscore followed by a number at the end of the string (That's what the '$' means) with an empty string.我正在传递 2 个正则表达式，第一个找到 substring 1_ 并将其替换为空字符串，第二个找到带有下划线后跟数字的子字符串（这就是 '$' 的含义）与一个空细绳。

Answer 6

Regex-related questions are always fun.与正则表达式相关的问题总是很有趣。

I'll throw one more to the mix.我会再扔一个。 Here's str.extract :这是str.extract ：

df['new_val'] = df['val'].str.extract('_(.+)_')

Output: Output：

         val  new_val
0      1_2_3        2
1    1_2_3_4      2_3
2  1_2_3_4_5    2_3_4

从 pandas - python 中的现有 df 创建新 df

问题描述

6 个解决方案

解决方案1
3 2021-02-12 03:32:54

解决方案2
1 2021-02-12 03:40:01

解决方案3
1 2021-02-12 03:41:30

解决方案4
1 2021-02-12 03:43:04

解决方案5
1 2021-02-12 03:44:48

解决方案6
1 2021-02-12 04:01:06

从 pandas - python 中的现有 df 创建新 df

问题描述

6 个解决方案

解决方案1 3 2021-02-12 03:32:54

解决方案2 1 2021-02-12 03:40:01

解决方案3 1 2021-02-12 03:41:30

解决方案4 1 2021-02-12 03:43:04

解决方案5 1 2021-02-12 03:44:48

解决方案6 1 2021-02-12 04:01:06

解决方案1
3 2021-02-12 03:32:54

解决方案2
1 2021-02-12 03:40:01

解决方案3
1 2021-02-12 03:41:30

解决方案4
1 2021-02-12 03:43:04

解决方案5
1 2021-02-12 03:44:48

解决方案6
1 2021-02-12 04:01:06