从 pandas dataframe 中的单元格中提取值

Question

2019-02-12  24;26;28    18;20;22    11;12;13    11;12;13    
2019-02-13  24;26;28    18;20;22    11;12;13    11;12;13

I want to extract the middle values each column of this dataframe.我想提取这个 dataframe 的每一列的中间值。 The data is the index and the string of numbers are the column values.数据是索引，数字字符串是列值。 How can I do this using pandas?如何使用 pandas 做到这一点？

The desired output would be:所需的 output 将是：

2019-02-12  26  20  12  12  
2019-02-13  26  20  12  12

Answer 1

If you want to apply it to all columns except of the first, you can do:如果要将其应用于除第一列之外的所有列，您可以执行以下操作：

Sample:样本：

          date    value1      value2      value3      value4
0   2019-02-12  24;26;28    18;20;22    11;12;13    11;12;13
1   2019-02-13  24;26;28    18;20;22    11;12;13    11;12;13

Solution:解决方案：

df.loc[:, df.columns[1:]] = df.loc[:, df.columns[1:]].apply(lambda x: x.str.split(';').str[1])

Output: Output：

          date  value1  value2  value3  value4
0   2019-02-12      26      20      12      12
1   2019-02-13      26      20      12      12

If you want to use it for certain columns, you can pass a list of their names instead of taking all except of the first:如果您想将它用于某些列，您可以传递它们的名称列表，而不是获取除第一个之外的所有列：

df.loc[:, [list_of_columns]]

If date is your index column and you want to apply it for the rest, don't use [1:]:如果日期是您的索引列并且您想将其应用于 rest，请不要使用 [1:]：

 df.loc[:, df.columns]

Answer 2

If the date column is the index, meaning that the DataFrame can be build from:如果日期列是索引，则意味着 DataFrame 可以从以下位置构建：

df = pd.DataFrame({1: {pd.Timestamp('2019-02-12 00:00:00'): '24;26;28',
                       pd.Timestamp('2019-02-13 00:00:00'): '24;26;28'},
                   2: {pd.Timestamp('2019-02-12 00:00:00'): '18;20;22', 
                       pd.Timestamp('2019-02-13 00:00:00'): '18;20;22'},
                   3: {pd.Timestamp('2019-02-12 00:00:00'): '11;12;13', 
                       pd.Timestamp('2019-02-13 00:00:00'): '11;12;13'},
                   4: {pd.Timestamp('2019-02-12 00:00:00'): '11;12;13',
                       pd.Timestamp('2019-02-13 00:00:00'): '11;12;13'}})

then you can clean it with:然后您可以使用以下方法清洁它：

df = df.apply(lambda x: x.str.replace(r'.*;(.*);.*', r'\1'))

It will give as expected:它将按预期给出：

             1   2   3   4
2019-02-12  26  20  12  12
2019-02-13  26  20  12  12

But IMHO, this kind of processing should occur before loading the data into the dataframe, or at load time.但是恕我直言，这种处理应该在将数据加载到 dataframe 之前或在加载时进行。 The sooner the better...越早越好...

Answer 3

Data preparation in the format you have mentioned..以您提到的格式准备数据..

df = pd.DataFrame(columns=['Date', 'A', 'B', 'C', 'D'])
df.loc[0] = ['2019-02-12' , '24;26;28'  ,' 18;20;22',    '11;12;13',    '11;12;13']
df.loc[1] = ['2019-02-13',  '24;26;28',    '18;20;22',    '11;12;13',    '11;12;13']
df ['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df
                   A          B         C         D
Date                                               
2019-02-12  24;26;28   18;20;22  11;12;13  11;12;13
2019-02-13  24;26;28   18;20;22  11;12;13  11;12;13

First split the values on ';'首先拆分';'上的值and then get the values at index 1 for the splitted values.然后获取拆分值的索引 1 处的值。

for col in df.columns:
    df[col]= df[col].str.split(';').str[1].astype(str)
    
df
             A   B   C   D
Date                      
2019-02-12  26  20  12  12
2019-02-13  26  20  12  12

Now you can use aggregate to join these现在您可以使用aggregate来加入这些

df['Result'] = df.agg(' '.join, axis=1)
df
             A   B   C   D       Result
Date                                   
2019-02-12  26  20  12  12  26 20 12 12
2019-02-13  26  20  12  12  26 20 12 12

从 pandas dataframe 中的单元格中提取值

问题描述

3 个解决方案

解决方案1
0 2021-03-17 15:24:14

解决方案2
0 2021-03-17 15:25:30

解决方案3
0 已采纳 2021-03-17 15:26:44

从 pandas dataframe 中的单元格中提取值

问题描述

3 个解决方案

解决方案1 0 2021-03-17 15:24:14

解决方案2 0 2021-03-17 15:25:30

解决方案3 0 已采纳 2021-03-17 15:26:44

解决方案1
0 2021-03-17 15:24:14

解决方案2
0 2021-03-17 15:25:30

解决方案3
0 已采纳 2021-03-17 15:26:44