简体   繁体   English

将 pandas 列中的变量长度列表拆分为列

[英]Spliting variable lenght list in pandas column into columns

In pandas dataframe I have column that looks like this:在 pandas dataframe 我有这样的列:

+----------------------------------------------+
|                carContactTel                 |
+----------------------------------------------+
| []                                           |
| ['tel 432424']                               |
| ['tel 84958358']                             |
| ['tel 5434645', 'tel 534535', 'tel 3242342'] |
+----------------------------------------------+

So some list elements are empty.所以一些列表元素是空的。 I'm trying to split this into new columns: tel1,tel2,tel3,tel4,tel5 .我正在尝试将其拆分为新列: tel1,tel2,tel3,tel4,tel5 If list is too short than values in corresponding columns should stay empty.如果列表太短,则相应列中的值应保持为空。

My last try based on solutions I've found:我最后一次尝试基于我找到的解决方案:

carContactDF = pd.DataFrame(carContactDF["carContactTel"].to_list(), columns=["carContactTel1", "carContactTel2", "carContactTel3", "carContactTel4", "carContactTel5"])

Errors are always about shape of list...tried replacing empty lists wit 'Nan' but that didn't work too.错误总是与列表的形状有关...尝试用'Nan'替换空列表,但这也没有用。

Lists are properly generated with another python script so there is no mistake in them...checked.列表是使用另一个 python 脚本正确生成的,因此它们没有错误...检查。

Error:错误:

ValueError: 5 columns passed, passed data had 3 columns ValueError:通过了 5 列,传递的数据有 3 列

Currently 3 items is top but script will run over larger dataset that will have list items with 5 elements.目前 3 个项目是最重要的,但脚本将在更大的数据集上运行,该数据集将具有 5 个元素的列表项。

Create a new dataframe from the carContactTel column, then use DataFrame.set_axis + DataFrame.add_prefix to conform the columns according to requirements, finally use DataFrame.fillna to replace NaN values with empty string: Create a new dataframe from the carContactTel column, then use DataFrame.set_axis + DataFrame.add_prefix to conform the columns according to requirements, finally use DataFrame.fillna to replace NaN values with empty string:

df1 = pd.DataFrame(carContactDF['carContactTel'].tolist())
df1 = (
    df1.set_axis(df1.columns + 1, 1).add_prefix('carContactTel')
    .fillna('').replace('^tel\s*', '', regex=True)
)

Result:结果:

print(df1)
  carContactTel1 carContactTel2 carContactTel3
0                                             
1         432424                              
2       84958358                              
3        5434645         534535        3242342

Filter rows where the len(carContactTel) < 5 and append na values to those lists.将 len(carContactTel) < 5 和 append na 值的行过滤到这些列表中。 Repeat until done.重复直到完成。 Then split.然后分开。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM