[英]Pandas DataFrame - Splitting Series Strings into Multiple Columns
My question is more about the methodology/syntax described into a previous post which addresses different approaches to meet the same objective of splitting string values into lists and assigning each list item to a new column. 我的问题更多地是关于上一篇文章中描述的方法/语法的,该方法/语法解决了实现将字符串值拆分为列表并将每个列表项分配给新列的相同目标的不同方法。 Here's the post: Pandas DataFrame, how do i split a column into two
这是帖子: Pandas DataFrame,我如何将一列分为两部分
df: DF:
GDP
Date
Mar 31, 2017 19.03 trillion
Dec 31, 2016 18.87 trillion
script 1 + ouput: 脚本1 +输出:
>>> df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1).str
>>> print(df)
GDP Units
Date
Mar 31, 2017 19.03 trillion
Dec 31, 2016 18.87 trillion
script 2 + output: 脚本2 +输出:
>>> df[['GDP', 'Units']] = df['GDP'].str.split(' ', 1, expand=True)
>>> print(df)
GDP Units
Date
Mar 31, 2017 19.03 trillion
Dec 31, 2016 18.87 trillion
script 3 + output: 脚本3 +输出:
>>> df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1, expand=True)
>>> print(df)
GDP Units
Date
Mar 31, 2017 0 1
Dec 31, 2016 0 1
Can anyone explain what is going on? 谁能解释发生了什么? Why does script 3 produce these values in the output?
为什么脚本3在输出中产生这些值?
Let's start by looking at this 让我们从看这个开始
df['GDP'].str.split(' ', 1)
0 [19.03, trillion]
1 [18.87, trillion]
Name: GDP, dtype: object
It produces a series of lists. 它产生一系列列表。 However, the
pd.Series.str
, aka string accessor allows us to access the first, second, ... parts of these embedded lists via intuitive python list indexing. 但是,
pd.Series.str
(又名字符串访问器)允许我们通过直观的python列表索引访问这些嵌入式列表的第一,第二,...部分。
df['GDP'].str.split(' ', 1).str[0]
Date
Mar 31, 2017 19.03
Dec 31, 2016 18.87
Name: GDP, dtype: object
Or 要么
df['GDP'].str.split(' ', 1).str[1]
Date
Mar 31, 2017 trillion
Dec 31, 2016 trillion
Name: GDP, dtype: object
So, if we split into two element lists, split(' ', 1)
we can treat the return object from an additional str
as an iterable 因此,如果我们将元素拆分为两个元素列表
split(' ', 1)
则可以将其他str
的返回对象视为可迭代对象
a, b = df['GDP'].str.split(' ', 1).str
a
Date
Mar 31, 2017 19.03
Dec 31, 2016 18.87
Name: GDP, dtype: object
And 和
b
Date
Mar 31, 2017 trillion
Dec 31, 2016 trillion
Name: GDP, dtype: object
Ok, we can short-cut the creation of two new columns by leveraging this iterable unpacking 好的,我们可以利用这种可迭代的拆包方式来简化两个新列的创建
df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1).str
However, we can pass a parameter to expand
our new lists into new dataframe columns 但是,我们可以传递参数以
expand
新列表expand
为新的数据框列
df['GDP'].str.split(' ', 1, expand=True)
0 1
Date
Mar 31, 2017 19.03 trillion
Dec 31, 2016 18.87 trillion
Now we can assign a dataframe to new columns of another dataframe like so 现在我们可以将数据框分配给另一个数据框的新列,如下所示
df[['GDP', 'Units']] = df['GDP'].str.split(' ', 1, expand=True)
However, when we do 但是,当我们这样做时
df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1, expand=True)
The return value of df['GDP'].str.split(' ', 1, expand=True)
gets unpacked and those results are simply the column values. df['GDP'].str.split(' ', 1, expand=True)
的返回值被解压,这些结果只是列值。 If you see just above, you notice they are 0
and 1
. 如果在上方看到,您会注意到它们是
0
和1
。 So in this case, 0
is assigned to the column df['GDP']
and 1
is assigned to the column df['Units']
因此,在这种情况下,将
0
分配给df['GDP']
,将1
分配给df['Units']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.