[英]How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?
I have a Pandas DataFrame that was created by reading a table from a PDF with tabula.我有一个 Pandas DataFrame,它是通过从带有表格的 PDF 读取表格创建的。 The PDF isn't parsed perfectly, so I end up with a few table columns smushed into one column in the resulting DataFrame.
PDF 没有被完美解析,所以我最终在生成的 DataFrame 中将一些表列弄乱成一列。 The issue is that one of the table columns in the PDF is text, so there are sometimes one word and sometimes two words that compose the column.
问题是 PDF 中的表格列之一是文本,因此有时一个单词有时两个单词组成该列。 Example:
例子:
Col_1 Col_2
0 Hello X Y A
1 Hello world Q R B
2 Hi S T C
I would like to split Col_1
into 3 columns.我想将
Col_1
分成 3 列。 I'm not sure how to do this, given that the first new column would sometimes consist of one word, as in the case of Rows 0 & 2, and sometimes consist of two words, as in the case of Row 1.我不确定如何执行此操作,因为第一个新列有时会包含一个单词,例如第 0 行和第 2 行,有时包含两个单词,例如第 1 行。
I have tried splitting the strings of Col_1
with df['Col_1'].str.split(' ', 4, expand=True)
, but this starts the splitting from the beginning of the string (from the left), whereas I would like the splitting to be done from the right, I suppose.我尝试使用
df['Col_1'].str.split(' ', 4, expand=True)
拆分Col_1
的字符串,但这会从字符串的开头(从左侧)开始拆分,而我会我想,就像从右边进行拆分一样。
You can try using str.rsplit
:您可以尝试使用
str.rsplit
:
Splits string around given separator/delimiter, starting from the right.
从右侧开始,围绕给定的分隔符/定界符拆分字符串。
df['Col_1'].str.rsplit(' ', 2, expand=True)
Output: Output:
0 1 2
0 Hello X Y
1 Hello world Q R
2 Hi S T
As a full dataframe:作为一个完整的 dataframe:
df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)
Output: Output:
nCol_0 nCol_1 nCol_2 Col_1 Col_2
0 Hello X Y Hello X Y A
1 Hello world Q R Hello world Q R B
2 Hi S T Hi S T C
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.