简体   繁体   English

拆分pandas包含多行字符串的系列行分成不同的行

[英]Split pandas Series rows containing multiline strings into separate rows

I have a pandas Series that is filled with strings like this: 我有一个充满了这样的字符串的熊猫系列:

In:    
s = pd.Series(['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.'])

Out:
0                        This is a single line.
1                          This is another one.
2    This is a string\nwith more than one line.
dtype: object

How can I split all rows in this Series that contain the linebreak character \\n into rows of their own? 如何将此系列中包含换行符\\n所有行拆分为自己的行? What I would expect is: 我期望的是:

0      This is a single line.
1        This is another one.
2            This is a string
3    with more than one line.
dtype: object

I know that I can split each row by the linebreak character with 我知道我可以用换行符分割每一行

s = s.str.split('\n')

which gives 这使

0                        [This is a single line.]
1                          [This is another one.]
2    [This is a string, with more than one line.]

but this only breaks the string within the row, not into rows of their own for each token. 但是这只会破坏行内的字符串,而不是每个令牌的行。

You could loop over each string in each row to create a new series: 您可以遍历每行中的每个字符串以创建新系列:

pd.Series([j for i in s.str.split('\n') for j in i])

It might make more sense to do this on the input rather than creating a temporary series, eg: 在输入上执行此操作可能更有意义,而不是创建临时系列,例如:

strings = ['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.']
pd.Series([j for i in strings for j in i.split('\n')])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM