[英]DataFrame transformation in Python Pandas
I am trying to transform a Pandas DataFrame into a new one with every item from a certain column given its own row. 我正在尝试将Pandas DataFrame转换为一个新的Pandas DataFrame,其中每个项目都来自某一列给定自己的行。 For example: 例如:
Before: 之前:
ID Name Date Location
0 0 John, Dave 01/01/1992 Mexico
1 1 Tim 07/07/1997 Australia
2 2 Mike, John 12/24/2012 Zambia
3 3 Bob, Rick, Tony 05/17/2007 Cuba
4 4 Roger 04/05/2000 Iceland
5 5 Carlos 05/24/1995 Guam
Current Solution: 当前解决方案
new_df = pd.DataFrame(columns = df.columns)
for index,row in df.iterrows():
new_row = pd.DataFrame(df.loc[index]).transpose()
target_info = df.loc[index,'Name']
if (len(target_info.split(',')) > 1):
for item in target_info.split(','):
new_row.loc[index,'Name'] = item
new_df = new_df.append(new_row)
else:
new_df = new_df.append(new_row)
After: 后:
ID Name Date Location
0 0 John 01/01/1992 Mexico
0 0 Dave 01/01/1992 Mexico
1 1 Tim 07/07/1997 Australia
2 2 Mike 12/24/2012 Zambia
2 2 John 12/24/2012 Zambia
3 3 Bob 05/17/2007 Cuba
3 3 Rick 05/17/2007 Cuba
3 3 Tony 05/17/2007 Cuba
4 4 Roger 04/05/2000 Iceland
5 5 Carlos 05/24/1995 Guam
Surely there is something more elegant? 当然有更优雅的东西?
You could get the split names as a Series, drop your existing Name column, then join the split names. 您可以将拆分名称作为系列,删除现有的名称列,然后加入拆分名称。
# Split the 'Name' column as a Series, setting the appropriate name and index.
split_names = df['Name'].str.split(', ', expand=True).stack()
split_names.name = 'Name'
split_names.index = split_names.index.get_level_values(0)
# Drop the existing 'Name' column, and join the split names.
df.drop('Name', axis=1, inplace=True)
df = df.join(split_names)
The resulting output is the same as in your example, but with the Name column last. 结果输出与示例中的输出相同,但最后是Name列。 You can reorder the columns if you want the original order. 如果您想要原始订单,可以对列重新排序。
ID Date Location Name
0 0 01/01/1992 Mexico John
0 0 01/01/1992 Mexico Dave
1 1 07/07/1997 Australia Tim
2 2 12/24/2012 Zambia Mike
2 2 12/24/2012 Zambia John
3 3 05/17/2007 Cuba Bob
3 3 05/17/2007 Cuba Rick
3 3 05/17/2007 Cuba Tony
4 4 04/05/2000 Iceland Roger
5 5 05/24/1995 Guam Carlos
you can do it this way: 你可以这样做:
nm = df.Name.str.split(',\s*', expand=True)
cols=list(set(df.columns) - set(['Name']))
pd.melt(df[cols].join(nm),
id_vars=cols,
value_vars=nm.columns.tolist(),
value_name='Name') \
.dropna() \
.drop(['variable'], axis=1) \
.sort_values('ID')
Step by step: 一步步:
In [128]: nm = df.Name.str.split(',\s*', expand=True)
In [129]: nm
Out[129]:
0 1 2
0 John Dave None
1 Tim None None
2 Mike John None
3 Bob Rick Tony
4 Roger None None
5 Carlos None None
In [130]: cols=list(set(df.columns) - set(['Name']))
In [131]: cols
Out[131]: ['Date', 'ID', 'Location']
In [133]: pd.melt(df[cols].join(nm),
.....: id_vars=cols,
.....: value_vars=nm.columns.tolist(),
.....: value_name='Name') \
.....: .dropna() \
.....: .drop(['variable'], axis=1) \
.....: .sort_values('ID')
Out[133]:
Date ID Location Name
0 01/01/1992 0 Mexico John
6 01/01/1992 0 Mexico Dave
1 07/07/1997 1 Australia Tim
2 12/24/2012 2 Zambia Mike
8 12/24/2012 2 Zambia John
3 05/17/2007 3 Cuba Bob
9 05/17/2007 3 Cuba Rick
15 05/17/2007 3 Cuba Tony
4 04/05/2000 4 Iceland Roger
5 05/24/1995 5 Guam Carlos
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.