[英]Split pandas Series rows containing multiline strings into separate rows
I have a pandas Series that is filled with strings like this: 我有一个充满了这样的字符串的熊猫系列:
In:
s = pd.Series(['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.'])
Out:
0 This is a single line.
1 This is another one.
2 This is a string\nwith more than one line.
dtype: object
How can I split all rows in this Series that contain the linebreak character \\n
into rows of their own? 如何将此系列中包含换行符
\\n
所有行拆分为自己的行? What I would expect is: 我期望的是:
0 This is a single line.
1 This is another one.
2 This is a string
3 with more than one line.
dtype: object
I know that I can split each row by the linebreak character with 我知道我可以用换行符分割每一行
s = s.str.split('\n')
which gives 这使
0 [This is a single line.]
1 [This is another one.]
2 [This is a string, with more than one line.]
but this only breaks the string within the row, not into rows of their own for each token. 但是这只会破坏行内的字符串,而不是每个令牌的行。
You could loop over each string in each row to create a new series: 您可以遍历每行中的每个字符串以创建新系列:
pd.Series([j for i in s.str.split('\n') for j in i])
It might make more sense to do this on the input rather than creating a temporary series, eg: 在输入上执行此操作可能更有意义,而不是创建临时系列,例如:
strings = ['This is a single line.', 'This is another one.', 'This is a string\nwith more than one line.']
pd.Series([j for i in strings for j in i.split('\n')])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.