[英]String splitting and joining on a pandas dataframe
I have a dataframe containing devices and their corresponding firmware versions (eg 1.7.1.3).我有一个 dataframe 包含设备及其相应的固件版本(例如 1.7.1.3)。 I'm trying to shorten the firmware version to only show three numbers (eg 1.7.1).
我正在尝试缩短固件版本以仅显示三个数字(例如 1.7.1)。
I know how to do this on a single string but how would I make it efficient for a large dataframe?我知道如何在单个字符串上执行此操作,但我如何使其对大型 dataframe 有效?
test = "1.2.3.4"
test = test.split(".")
'.'.join(test[0:-1])
#sample dataframe:
import pandas as pd
df=pd.DataFrame({'data': {0: '1.2.3.4', 1: '1.2.3.9', 2: '1.2.3.8'}})
For this you can use:为此,您可以使用:
df['data']=df['data'].str.split('.').str[0:3].apply('.'.join)
OR或者
df['data']=df['data'].str[0:5]
OR或者
df['data']=df['data'].str[::-1].str.split('.',1).str[1].str[::-1]
Performance:表现:
This could be done by extract
function of pandas too, could you please try following.这也可以通过
extract
pandas 的 function 来完成,请您尝试以下操作。
df['data'] = df['data'].str.extract(r'^(\d+(?:\.\d+){2})', expand=True)
Simple explanation would be: using extract
function of Pandas and mentioning regex in it to catch only first 3 digits as per OP's need.简单的解释是:使用 Pandas 的
extract
function 并在其中提及正则表达式以根据 OP 的需要仅捕获前 3 位数字。
Taking example of DataFrame used by Anurag Dabas here:以 Anurag Dabas 使用的 DataFrame 为例:
Let's say df is following:假设 df 如下:
data
0 1.2.3.4
1 1.2.3.9
2 1.2.3.8
After running above code it will become like:运行上面的代码后会变成这样:
data
0 1.2.3
1 1.2.3
2 1.2.3
Here is one more way of doing it using .replace
:这是使用
.replace
的另一种方法:
import pandas as pd
df = pd.DataFrame({'data': {0: '1.2.3.4', 1: '1.2.3.9', 2: '1.2.3.8'}})
df['data'] = df['data'].str.replace(r'\.[^.]*$', '')
print (df['data'])
Output: Output:
0 1.2.3
1 1.2.3
2 1.2.3
Name: data, dtype: object
.replace(r'\.[^.]*$', '')
matches last dot and text after that, which is replaced with an empty string. .replace(r'\.[^.]*$', '')
匹配最后一个点和之后的文本,将其替换为空字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.