简体   繁体   English

如果列是可变长度的字符串,如何将 Pandas DataFrame 列拆分为多列?

[英]How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

I have a Pandas DataFrame that was created by reading a table from a PDF with tabula.我有一个 Pandas DataFrame,它是通过从带有表格的 PDF 读取表格创建的。 The PDF isn't parsed perfectly, so I end up with a few table columns smushed into one column in the resulting DataFrame. PDF 没有被完美解析,所以我最终在生成的 DataFrame 中将一些表列弄乱成一列。 The issue is that one of the table columns in the PDF is text, so there are sometimes one word and sometimes two words that compose the column.问题是 PDF 中的表格列之一是文本,因此有时一个单词有时两个单词组成该列。 Example:例子:

            Col_1  Col_2
0       Hello X Y      A
1 Hello world Q R      B
2          Hi S T      C

I would like to split Col_1 into 3 columns.我想将Col_1分成 3 列。 I'm not sure how to do this, given that the first new column would sometimes consist of one word, as in the case of Rows 0 & 2, and sometimes consist of two words, as in the case of Row 1.我不确定如何执行此操作,因为第一个新列有时会包含一个单词,例如第 0 行和第 2 行,有时包含两个单词,例如第 1 行。

I have tried splitting the strings of Col_1 with df['Col_1'].str.split(' ', 4, expand=True) , but this starts the splitting from the beginning of the string (from the left), whereas I would like the splitting to be done from the right, I suppose.我尝试使用df['Col_1'].str.split(' ', 4, expand=True)拆分Col_1的字符串,但这会从字符串的开头(从左侧)开始拆分,而我会我想,就像从右边进行拆分一样。

You can try using str.rsplit :您可以尝试使用str.rsplit

Splits string around given separator/delimiter, starting from the right.从右侧开始,围绕给定的分隔符/定界符拆分字符串。

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output: Output:

             0  1  2
0        Hello  X  Y
1  Hello world  Q  R
2           Hi  S  T

As a full dataframe:作为一个完整的 dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output: Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0        Hello      X      Y        Hello X Y     A
1  Hello world      Q      R  Hello world Q R     B
2           Hi      S      T           Hi S T     C

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 需要将pandas dataframe列中的可变长度数据拆分为多个列 - Need to split variable length data in a pandas dataframe column into multiple columns 基于不同长度的分隔符拆分pandas字符串列 - Split pandas string column based on varying length separator Pandas Dataframe:将字符串列值拆分为多列 - Pandas Dataframe: split string column value into multiple columns 将Pandas数据框列中的列表拆分为多列 - Split list in Pandas dataframe column into multiple columns 如何将单列pandas数据帧拆分为多个列? - How to split single column of pandas dataframe into multiple columns with group? 如何使用Pandas转换器将数据框列拆分为多列 - How to split a dataframe column into multiple columns with a Pandas converter 如何将单个 Pandas Dataframe 列的内容拆分为多个新列 - How to Split the Contents of a Single Pandas Dataframe Column into Multiple New Columns 如何将可变大小的基于字符串的列拆分为 Pandas DataFrame 中的多列? - How to split variable sized string-based column into multiple columns in Pandas DataFrame? 如何根据字符串长度将一列拆分为多列? - How to split a column into multiple columns based on string length? 根据Python上列的文本长度将数据框的列拆分为多列 - Split the column of dataframe into multiple columns according to the text length of column on Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM