python pandas将两行或多行文本合并为一行

Question

I have data frame with text data like below,我有如下文本数据的数据框，

    name | address                  | number 
1   Bob    bob                        No.56
2          @gmail.com           
3   Carly  carly@world.com            No.90
4   Gorge  greg@yahoo     
5          .com                   
6                                     No.100

and want to make it like this frame.并想把它做成这个框架。

    name | address               | number 
1   Bob    bob@gmail.com           No.56
2   Carly  carly@world.com         No.90                 
3   Gorge  greg@yahoo.com          No.100

I am using pandas to read file but not sure how to use merge or concat.我正在使用熊猫读取文件，但不确定如何使用合并或连接。

Answer 1

In case of name column consists of unique values,如果name列包含唯一值，

print df

    name          address  number
0    Bob              bob   No.56
1    NaN       @gmail.com     NaN
2  Carly  carly@world.com   No.90
3  Gorge       greg@yahoo     NaN
4    NaN             .com     NaN
5    NaN              NaN  No.100

df['name'] = df['name'].ffill()
print df.fillna('').groupby(['name'], as_index=False).sum()

    name          address  number
0    Bob    bob@gmail.com   No.56
1  Carly  carly@world.com   No.90
2  Gorge   greg@yahoo.com  No.100

you may need ffill() , bfill() , [::-1] , .groupby('name').apply(lambda x: ' '.join(x['address'])) , strip() , lstrip() , rstrip() , replace() kind of thing to extend above code to more complicated data.你可能需要ffill() , bfill() , [::-1] , .groupby('name').apply(lambda x: ' '.join(x['address'])) , strip() , lstrip() , rstrip() , replace()将上面的代码扩展到更复杂的数据。

Answer 2

If you want to convert a data frame of sex rows (with possible NaN entry in each column), there might be no direct pandas methods for that.如果要转换性别行的数据框（每列中可能有NaN条目），可能没有直接的pandas方法。

You will need some codes to assign the value in name column, so that pandas can know the split rows of bob and @gmail.com belong to same user Bob .您将需要一些代码来分配name列中的值，以便熊猫可以知道bob和@gmail.com的拆分行属于同一用户Bob 。

You can fill each empty entry in column name with its preceding user using the fillna or ffill methods, see pandas dataframe missing data .您可以使用fillna或ffill方法使用其前面的用户填充列name每个空条目，请参阅pandas ffill missing data 。

df ['name'] = df['name'].ffill()

# gives
    name    address number
0   Bob bob No.56
1   Bob @gmail.com  
2   Carly   carly@world.com No.90
3   Gorge   greg@yahoo  
4   Gorge   .com    
5   Gorge       No.100

Then you can use the groupby and sum as the aggregation function.然后您可以使用groupby和sum作为聚合函数。

df.groupby(['name']).sum().reset_index()

# gives
    name    address number
0   Bob bob@gmail.com   No.56
1   Carly   carly@world.com No.90
2   Gorge   greg@yahoo.com  No.100

You may find converting between NaN and white space useful, see Replacing blank values (white space) with NaN in pandas and pandas.DataFrame.fillna .您可能会发现NaN和空白之间的转换很有用，请参阅在 pandas和pandas.DataFrame.fillna 中用 NaN 替换空白值（空白）。

python pandas将两行或多行文本合并为一行

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-02-15 04:19:18

解决方案2
0 2017-02-15 04:02:16

python pandas将两行或多行文本合并为一行

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-02-15 04:19:18

解决方案2 0 2017-02-15 04:02:16

解决方案1
1 已采纳 2017-02-15 04:19:18

解决方案2
0 2017-02-15 04:02:16