[英]python pandas merge two or more lines of text into one line
I have data frame with text data like below,我有如下文本数据的数据框,
name | address | number
1 Bob bob No.56
2 @gmail.com
3 Carly carly@world.com No.90
4 Gorge greg@yahoo
5 .com
6 No.100
and want to make it like this frame.并想把它做成这个框架。
name | address | number
1 Bob bob@gmail.com No.56
2 Carly carly@world.com No.90
3 Gorge greg@yahoo.com No.100
I am using pandas to read file but not sure how to use merge or concat.我正在使用熊猫读取文件,但不确定如何使用合并或连接。
In case of name
column consists of unique values,如果name
列包含唯一值,
print df
name address number
0 Bob bob No.56
1 NaN @gmail.com NaN
2 Carly carly@world.com No.90
3 Gorge greg@yahoo NaN
4 NaN .com NaN
5 NaN NaN No.100
df['name'] = df['name'].ffill()
print df.fillna('').groupby(['name'], as_index=False).sum()
name address number
0 Bob bob@gmail.com No.56
1 Carly carly@world.com No.90
2 Gorge greg@yahoo.com No.100
you may need ffill()
, bfill()
, [::-1]
, .groupby('name').apply(lambda x: ' '.join(x['address']))
, strip()
, lstrip()
, rstrip()
, replace()
kind of thing to extend above code to more complicated data.你可能需要ffill()
, bfill()
, [::-1]
, .groupby('name').apply(lambda x: ' '.join(x['address']))
, strip()
, lstrip()
, rstrip()
, replace()
将上面的代码扩展到更复杂的数据。
If you want to convert a data frame of sex rows (with possible NaN
entry in each column), there might be no direct pandas
methods for that.如果要转换性别行的数据框(每列中可能有NaN
条目),可能没有直接的pandas
方法。
You will need some codes to assign the value in name
column, so that pandas can know the split rows of bob
and @gmail.com
belong to same user Bob
.您将需要一些代码来分配name
列中的值,以便熊猫可以知道bob
和@gmail.com
的拆分行属于同一用户Bob
。
You can fill each empty entry in column name
with its preceding user using the fillna
or ffill
methods, see pandas dataframe missing data .您可以使用fillna
或ffill
方法使用其前面的用户填充列name
每个空条目,请参阅pandas ffill
missing data 。
df ['name'] = df['name'].ffill()
# gives
name address number
0 Bob bob No.56
1 Bob @gmail.com
2 Carly carly@world.com No.90
3 Gorge greg@yahoo
4 Gorge .com
5 Gorge No.100
Then you can use the groupby
and sum
as the aggregation function.然后您可以使用groupby
和sum
作为聚合函数。
df.groupby(['name']).sum().reset_index()
# gives
name address number
0 Bob bob@gmail.com No.56
1 Carly carly@world.com No.90
2 Gorge greg@yahoo.com No.100
You may find converting between NaN
and white space useful, see Replacing blank values (white space) with NaN in pandas and pandas.DataFrame.fillna .您可能会发现NaN
和空白之间的转换很有用,请参阅在 pandas和pandas.DataFrame.fillna 中用 NaN 替换空白值(空白) 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.