在熊猫数据框中创建新列

Question

I'm extremely new to python and have been searching google and stackoverflow to solve this issue which I am sure is simply a syntax problem.我对 python 非常陌生，一直在搜索 google 和 stackoverflow 来解决这个问题，我确信这只是一个语法问题。

I have a data frame with several columns.我有一个包含几列的数据框。

import pandas as pd
df = pd.read_csv("C:/path/file.csv")

My csv has 5 columns and ~ 100k rows I simply want a substring of the first 2 digits of column 5.我的 csv 有 5 列和 ~ 100k 行，我只想要第 5 列的前 2 位数字的子字符串。

I've tried:我试过了：

df.assign(new = lambda x: x.column5[0:2],)

This creates the new field and populates the first two rows with the complete value in column 5 and gives me NaN for the remainder.这将创建新字段并使用第 5 列中的完整值填充前两行，并为余数提供 NaN。

These attempts give me syntax erros:这些尝试给了我语法错误：

df['new'] = df['column5'].str[0:2]
df.map(lambda df['column5']: [:2])

I am simply at a loss of how to create a new column using the first two digits of an existing column from a table read in via pandas.我只是不知道如何使用通过 Pandas 读入的表中现有列的前两位数字创建新列。

If this were SAS I'd have been done hours ago, but I am trying to make a go of Python so your help is appreciated如果这是 SAS，我几个小时前就已经完成了，但我正在尝试使用 Python，因此感谢您的帮助

Answer 1

I guess your column5 column is of int*/float* dtype, so try to convert it to string first:我猜您的column5列是 int*/float* dtype，因此请先尝试将其转换为字符串：

df['new'] = df['column5'].astype(str).str[:2]

you can explicitly specify types of columns when reading CSV file:您可以在读取 CSV 文件时明确指定列类型：

df = pd.read_csv('file_name.csv', ..., dtype={'column5': object})

在熊猫数据框中创建新列

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-03 14:49:47

在熊猫数据框中创建新列

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-03 14:49:47

解决方案1
1 已采纳 2016-05-03 14:49:47