[英]Pandas: get second character of the string, from every row
I've a array of data in Pandas and I'm trying to print second character of every string in col1. 我在Pandas中有一系列数据,我正在尝试在col1中打印每个字符串的第二个字符。 I can't figure out how to do it.
我无法弄清楚该怎么做。 I can easily print the second character of the each string individually, for example:
我可以轻松地单独打印每个字符串的第二个字符,例如:
array.col1[0][1]
However I'd like to print the second character from every row, so there would be a "list" of second characters. 但是我想从每一行打印第二个字符,所以会有一个第二个字符的“列表”。
I've tried 我试过了
array.col1[0:][1]
but that just returns the second line as a whole of col1. 但这只是返回第二行作为整个col1。
Any advice? 有什么建议?
You can use str
to access the string methods for the column/Series and then slice the strings as normal: 您可以使用
str
访问列/ Series的字符串方法,然后正常切片字符串:
>>> df = pd.DataFrame(['foo', 'bar', 'baz'], columns=['col1'])
>>> df
col1
0 foo
1 bar
2 baz
>>> df.col1.str[1]
0 o
1 a
2 a
This str
attribute also gives you access variety of very useful vectorised string methods, many of which are instantly recognisable from Python's own assortment of built-in string methods ( split
, replace
, etc.). 这个
str
属性还为您提供了各种非常有用的矢量化字符串方法,其中许多方法可以通过Python自己的内置字符串方法( split
, replace
等)立即识别。
As of Pandas 0.23.0, if your data is clean, you will find Pandas "vectorised" string methods via pd.Series.str
will generally underperform simple iteration via a list comprehension or use of map
. 由于大熊猫0.23.0,如果你的数据是干净的,你会发现大熊猫“矢量化”通过字符串方法
pd.Series.str
将通过列表理解或使用的通常表现不佳简单的迭代map
。
For example: 例如:
from operator import itemgetter
df = pd.DataFrame(['foo', 'bar', 'baz'], columns=['col1'])
df = pd.concat([df]*100000, ignore_index=True)
%timeit pd.Series([i[1] for i in df['col1']]) # 33.7 ms
%timeit pd.Series(list(map(itemgetter(1), df['col1']))) # 42.2 ms
%timeit df['col1'].str[1] # 214 ms
A special case is when you have a large number of repeated strings, in which case you can benefit from converting your series to a categorical : 一个特例是当您有大量重复的字符串时,在这种情况下,您可以将系列转换为分类 :
df['col1'] = df['col1'].astype('category')
%timeit df['col1'].str[1] # 4.9 ms
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.