Pandas数据框的子集，其中包含具有特定列值的行

Question

I'm having a problem with a single line of my code. 我的单行代码有问题。 Here is what I'd like to achieve: 这是我想要实现的目标：

reading_now is a string consisting of 3 characters reading_now是由3个字符组成的字符串
df2 is a data frame that is a subset of df1 df2是作为df1子集的数据帧
I'd like df2 to consist of rows from df1 where the first three characters of the value in column "Code" is equal to "reading_now" 我希望df2由df1中的行组成，其中“代码”列中值的前三个字符等于“ reading_now”

I tried using the following two lines with no success: 我尝试使用以下两行没有成功：

*df2 = df1.loc[(df1['Code'])[0:3] == reading_now]*
*df2 = df1[(str(df1.Code)[0:3] == reading_now)]*

Answer 1

You could use 你可以用

df2 = df1[df1['Code'].str[0:3] == reading_now]

For example: 例如：

data = ['abcd', 'cbdz', 'abcz', 'bdaz']

df1 = pd.DataFrame(data, columns=['Code'])
df2 = df1[df1['Code'].str[0:3] == 'abc']

df2 will result in a dataframe with 'Code' column containing 'abcd' and 'abcz' df2将导致数据框的“代码”列包含“ abcd”和“ abcz”

Answer 2

Looks like you were really close with your 2nd attempt. 看起来您第二次尝试真的很接近。

You could solve this a couple of different ways. 您可以通过两种不同的方法解决此问题。

reading_now = 'AAA'
df1 = pd.DataFrame([{'Code': 'AAA'}, {'Code': 'BBB'}, {'Code': 'CCC'}])

solution : 解决方案 ：

df2 = df1[df1['Code'].str.startswith(reading_now)]

or 要么

df2 = df1[df1['Code'][0:3] == reading_now]

The df2 dataframe will contain the row that starts with the reading_now string. df2数据帧将包含以reading_now字符串开头的行。